[关键词]
[摘要]
离群点检测问题作为数据挖掘的一个重要任务,在众多领域中得到了应用.近年来,基于数据流数据的挖掘算法研究受到越来越多的重视.为了解决数据流数据中的离群点检测问题,提出了一种基于数据空间动态网格划分的快速数据流离群点检测算法.算法利用动态网格对空间中的稠密和稀疏区域进行划分,过滤处于稠密区域的大量主体数据,有效地减少了算法所需考察的数据对象的规模.而对于稀疏区域中的候选离群点,采用近似方法计算其离群度,具有高离群度的数据作为离群点输出.在保证一定精确度的条件下,算法的运行效率可以得到大幅度提高.对模拟数据集和真实数据集的实验检测均验证了该算法具有良好的适用性和有效性.
[Key word]
[Abstract]
As an important task of data mining, outlier detection has been applied to many fields. Recently, research on mining in data stream is receiving more and more attention. For solving outlier detection in data stream, a new fast outlier detection algorithm is presented. Based on dynamically grid partitioning data space, the method separates dense areas from sparse areas. Data in dense areas are filtered simply, which reduces greatly the size of objects the algorithm should consider. Outliernesses of candidates in sparse areas are approximated efficiently. Data with high outlierness are outputted as outliers. Results of experiments on synthetic and real data sets show promising availabilities of the approaches.
[中图分类号]
[基金项目]
Supported by the National Natural Science Foundation of China under Grant No.60572112(国家自然科学基金)