Abstract:Microservice architecture is gradually adopted by more and more applications. How to effectively detect and locate faults is a key technology to guarantee the performance and reliability of microservices. Current approaches typically monitor physical metrics, and manually set alarm rules according to the domain knowledge. However, these approaches cannot automatically detect faults and locate root causes in fine granularity. To address the above issues, this work proposes a fault diagnosis approach for microservices based on the execution trace monitoring. First, dynamic instrumentation is used to monitor the execution traces crossing service components, and then call trees are used to describe the execution traces of user requests. Second, for the faults affecting the structure of execution traces, the tree edit distance is used to assess the abnormality degree of processing requests, and the method calls leading to failures are located by analyzing the difference between execution traces. Third, for the performance anomalies leading to the response delay, principal component analysis is used to extract the key method invocations causing unusual fluctuations in performance metrics. Experimental results show that this new approach can accurately characterize the execution trace of processing requests, and locate the methods that cause system failures and performance anomalies.