基于概率数据流的有效聚类算法
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Basic Research Program of China under Grant No.2005CB321905 (国家重点基础研究发展计划(973))


Effective Clustering Algorithm for Probabilistic Data Stream
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    提出一种在概率数据流上进行聚类的有效方法P-Stream.P-Stream针对数据流上的概率元组提出强簇、过渡簇和弱簇的概念,设计一种有效的在线候选簇选择策略,为每个不断到达的数据元组合理地找到可能归属的簇,并在每个检查点存储微簇快照,以便离线进一步高层聚类和演化分析.最后设计一个“积极”的二层聚类模型来判断现有的第1层聚类模型是否还适应数据流中最近到达的概率元组.实验采用KDD-CUP’98和KDD-CUP’99真实数据集以及变换高斯分布的人工数据集构造概率数据流.实验结果表明,P-Stream具有良好的聚类质量、较快的处理速度,能够有效地适应数据演化情况.

    Abstract:

    An effective clustering algorithm called “P-Stream” for probabilistic data stream is developed in this paper for the first time. For the uncertain tuples in the data stream, the concepts of strong cluster, transitional clusters and weak cluster are proposed in the P-Stream. With these concepts, an effective strategy of choosing candidate cluster is designed, which can find the sound cluster for every continuously arriving data point. Then, in order to further cluster on the high level and analyze the evolving behaviors of data streams, snapshots of micro-clusters are stored at every checkpoint. At last, an “aggressive” two-tier clustering model is introduced to judge whether the most recently arrived data point is fitting in with the first level clustering model or not. Probabilistic data streams in the experiments include KDD-CUP’98 and KDD-CUP’99 real data sets and synthetic data sets with changing Gaussian distributions. Comprehensive experimental results demonstrate that P-Stream is of high quality, fast processing rate and is efficiently fitting in with the evolving situations of data streams.

    参考文献
    相似文献
    引证文献
引用本文

戴东波,赵杠,孙圣力.基于概率数据流的有效聚类算法.软件学报,2009,20(5):1313-1328

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-11-13
  • 最后修改日期:2008-03-06
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号