Abstract:With the increasing number of logs produced by nodes in CNGrid, traditional manual methods for abnormal log analysis can no longer meet the need of daily analysis. This study proposed a method to define the abnormal log traffic pattern: The orderly arrangement of log types in the same node and at the same time slice represents a log traffic pattern. Based on this method, a log traffic pattern detection method was implemented, which was applied in automatically mine of abnormal log traffic pattern. The method starts with system log and classifies automatically according to the text similarity of log content. Then, the frequency of each types of log in the same time slice is taken as the input feature, and the anomaly detection method based on principal component analysis (PCA) is used to detect the abnormal input, and a large number of abnormal log type sequences are obtained. A distance metric based on the longest common subsequence is used to cluster these sequences by hierarchical clustering method. The clustering results are used with the adaptive K-itemset algorithm to get the deputies of the abnormal log flow modes. The above method was used to analyze the logs generated in the national high performance computing environment CNGrid in half a year according to different time periods (morning, night, midnight), and has obtained the abnormal log traffic patterns and their relationships in different time periods. The method can also be extended to the system logs of other distributed systems.