[关键词]
[摘要]
提出了EMicro算法,以解决不确定数据流上的聚类问题.与现有技术大多仅考虑元组间的距离不同,EMicro算法综合考虑了元组之间的距离与元组自身不确定性这两个因素,同时定义新标准来描述聚类结果质量.还提出了离群点处理机制,系统同时维护两个缓冲区,分别存放正常的微簇与潜在的离群点微簇,以期得到理想的性能.实验结果表明,与现有工作相比,EMicro的效率更高,且效果良好.
[Key word]
[Abstract]
This paper proposes a novel algorithm, named EMicro, to cluster uncertain data streams. Although most of the works used today mainly use the distance metric to describe the cluster quality, EMicro considers distance metric and data uncertainty together to measure the clustering quality. Another contribution of this paper is the outlier processing mechanism. Two buffers are maintained to reserve normal micro-clusters and potential outlier micro-clusters, respectively, to obtain good performance. Experimental results show that EMicro outperforms existing methods in efficiency and effectiveness.
[中图分类号]
[基金项目]
Supported by the National Natural Science Foundation of China under Grant Nos.60933001, 60803020 (国家自然科学基金); the National Science Foundation for Distinguished Young Scholars of China under Grant No.60925008 (国家杰出青年基金项目); the Shanghai Leading Academic Disc