[关键词]
[摘要]
速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.
[Key word]
[Abstract]
The effectiveness and efficiency are two problems in clustering algorithms. DBSCAN is a typical density based clustering algorithm that is very efficient on large databases. In this paper, a recursive density based clustering algorithm that can adaptively change its parameters intelligently is presented. This clustering algorithm RDBC (recursive density based clustering algorithm) is based on DBSCAN. It can be shown that RDBC require the same time complexity as that of the DBSCAN algorithm. In addition, it is proved both analytically and experimentally that this method yields results more superior than that of DBSCAN.
[中图分类号]
[基金项目]
国家重点基础研究发展规划973资助项目(G1998030509)