Abstract:Anomaly detection is an important research area of data mining.Current outlier mining approaches based on the distance or the nearest neighbor can result in unmanageable long operation time when applied to massive high-dimensional data.Many improvements have been proposed to improve the algorithms, but the detection is ineffective.This paper presents a new anomaly detection algorithm based on the local distance of density-based sampling data.First, the density-based of probability sampling method is used to find a subset of the data in detection.Then, the method based on the local distance of local outlier detection is used to calculate the abnormal factor of each object in the subset.In using the density-based of sample data, the abnormal factor is obtained both as local outlier factor of the subset and as the approximate value of global outlier factor of the hole data.Having the abnormal factor of each object in the subset, data points with higher factor score indicate higher degree of outliers.Experimental results show that, compared with the existing algorithms, this algorithm has higher detection accuracy and less computation time.The algorithm has higher efficiency and stronger scalability for various dimensions and size of data points.