基于谱聚类的多数据流演化事件挖掘
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.600773169 (国家自然科学基金); the 11th Five Years Key Programs for Science & Technology Development of China under Grant No.2006BAI05A01 (国家“十一?五”科技支撑计划)


Mining Evolutionary Events from Multi-Streams Based on Spectral Clustering
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决从多数据流挖掘演化事件这一难题,提出了一种多数据流上的谱聚类算法SCAM(spectral clustering algorithm of multi-streams),其相似矩阵基于耦合度构造,而耦合度衡量了两个数据流的动态相似性.提出了算法EEMA(evolutionary events mining algorithm),该算法基于聚类模型的演变挖掘多数据流的演化事件.定义了聚类模型凝聚度,用以衡量聚类的紧凑程度,并证明了凝聚度的上界.基于到上界的距离和规范化相似矩阵的特征间隙,定义了聚类模型质量,并作为EEMA的优化目标自动地确定聚簇数k.设计了O-EEMA作为EEMA的优化实现,其时间复杂度为O(cn2/2).在合成和真实数据集上的实验结果表明,EEMA和O-EEMA是有效的、可行的.

    Abstract:

    To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable.

    参考文献
    相似文献
    引证文献
引用本文

杨宁,唐常杰,王悦,陈瑜,郑皎凌.基于谱聚类的多数据流演化事件挖掘.软件学报,2010,21(10):2395-2409

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2009-04-22
  • 最后修改日期:2009-10-10
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号