一种基于参考点和密度的快速聚类算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National High-Tech Research and Development Plan of China under Grant No.2002AA483440 (国家高技术研究发展计划(863)); the National Grand Fundamental Research 973 Program of China under Grant No.G1999032705 (国家重点基础研究发展规划(973)); the Foundation of the Innovation Research Institute of PKU-IBM of China (北京大学-IBM创新研究院项目)


A Fast Clustering Algorithm Based on Reference and Density
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    数据的规模越来越大,要求数据挖掘算法有很高的执行效率.基于密度的聚类是聚类分析中的一种,其主要优点是发现任意形状的聚类和对噪音数据不敏感.提出了一种新的基于参考点和密度的CURD(clustering using references and density)聚类算法,其创新点在于,通过参考点来准确地反映数据的空间几何特征,然后基于参考点对数据进行分析处理.CURD算法保持了基于密度的聚类算法的上述优点,而且CURD算法具有近似线性的时间复杂性,因此CURD算法适合对大规模数据的挖掘.理论分析和实验结果也证明了CURD算法具有处理任意形状的聚类、对噪音数据不敏感的特点,并且其执行效率明显高于传统的基于R*-树的DBSCAN算法.

    Abstract:

    The efficiency of data mining algorithms is strongly needed with data becoming larger and larger. Density-Based clustering analysis is one kind of clustering analysis methods that can discover clusters with arbitrary shape and is insensitive to noise data. In this paper, a new kind of clustering algorithm that is called CURD (clustering using references and density) is presented. The creativity of CURD is capturing the shape and extent of a cluster by references, and then analyzes the data based on the references. CURD keeps the ability of density based clustering method抯 good features, and it can reach high efficiency because of its linear time complexity, so it can be used in mining very large databases. Both theory analysis and experimental results confirm that CURD can discover clusters with arbitrary shape and is insensitive to noise data. In the meanwhile, its executing efficiency is much higher than traditional DBSCAN algorithm based on R*-tree.

    参考文献
    相似文献
    引证文献
引用本文

马帅,王腾蛟,唐世渭,杨冬青,高军.一种基于参考点和密度的快速聚类算法.软件学报,2003,14(6):1089-1095

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2002-04-19
  • 最后修改日期:2002-07-02
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号