Abstract:Databases are important foundational components in computer services. However, performance anomalies may occur during their operation, affecting business service quality. How to diagnose performance anomalies in databases has become a hot issue in industry and academia. Recently, a series of automated database anomaly diagnosis methods have been successively proposed. They analyze the runtime status of the database and determine the overall database anomaly types. However, with the continuous expansion of data scale, distributed databases are becoming an increasingly popular solution in the industry. In a distributed database, which is composed of multiple nodes, existing anomaly diagnosis methods struggle to effectively locate node anomalies, fail to identify compound anomalies across multiple nodes, and are unable to perceive the complex performance influence relationships between nodes, lacking effective diagnostic capabilities. To address these challenges, this study proposes a distributed database diagnosis method for compound anomalies, named DistDiagnosis. It models the anomalous state of distributed databases using a Compound Anomaly Graph, which not only represents anomalies at each node but also effectively captures the correlations between nodes. DistDiagnosis introduces a node correlation-aware root cause anomaly ranking method, effectively locating root cause anomalies according to the influence of nodes on the database. In this study, anomaly testing cases for various scenarios are constructed on OceanBase, a domestically developed distributed database. Experimental results show that DistDiagnosis outperforms other advanced baselines, achieving the AC@1, AC@3, and AC@5 values of 0.97, 0.98, and 0.98. Compared to the second-best method, DistDiagnosis improves accuracy by up to 5.20%, 5.45%, and 4.46% in each diagnostic scenario.