Abstract:Databases are foundational components in computer services, however, performance anomalies can damage service quality. How to diagnose performance anomalies in databases has become a hot problem in industry and academia. Recently, a series of automated anomaly diagnosis methods have been proposed. They analyze the runtime status of the database and find the most likely anomalies. However, with the expansion of data scale, distributed databases are becoming increasingly popular in enterprises. In a distributed database, which is composed of multiple nodes, existing anomaly diagnosis methods struggle to effectively locate anomalies that can occur on nodes, and fail to identify compound anomalies across multiple nodes, resulting in insufficient diagnostic capabilities. To address these challenges, we propose an anomaly diagnosis method for compound anomalies in distributed databases, DistDiagnosis. It models the anomalous state of distributed databases using a Compound Anomaly Graph, which not only represents anomalies at each node but also captures the correlations between nodes. DistDiagnosis introduces a correlation-aware root cause ranking method, locating root cause anomalies based on the relation of nodes. In this work, we construct anomaly testing cases for different scenarios on the domestically developed distributed database OceanBase. The experimental results show that DistDiagnosis outperforms other SOTA baselines, achieving the AC@1, AC@3, and AC@5 values of 0.97, 0.98, and 0.98. Compared to the second-best method, DistDiagnosis improves accuracy by up to 5.20%, 5.45%, and 4.46%, respectively.