基于多项式核的结构化有向树数据聚类算法

微信服务号

微信订阅号

首页 > 过刊浏览>2008年第19卷第12期 >3147-3160

基于多项式核的结构化有向树数据聚类算法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60632050 (国家自然科学基金)

Polynomial Kernel Based Structural Clustering Algorithm by Building Directed Trees

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

各个点在数据内部的组织结构中自然地扮演着3种不同的结构性角色,分别是毂、质心和野值.在基于邻域的聚类算法中,邻域密度因子能够识别分离数据集中的毂、质心和野值.但是,邻域密度因子对有噪声和重叠的数据往往失效.为了解决该问题,引入了基于多项式核的邻域密度因子,并在有向树框架下,提出了一种结构化的数据聚类算法,其计算复杂度线性于输入数据的大小.对带有噪声和重叠的数据集,该算法能够找到所有显著的、任意形状的不均衡聚类.在人工和真实数据集上的实验结果都证实了该算法的有效性和快速性.

Abstract:

Within the internal organization of the data, the data points respectively play three different structural roles: the hub, centroid and outlier. The neighborhood-based density factor (NDF) used in the neighborhood based clustering (NBC) algorithm has the ability of identifying which points act as hubs, centriods or outliers in separated-well data set. However, NDF often works poorly in the circumstances of noise and overlapping. This paper introduces a polynomial kernel based neighborhood density factor (PKNDF) to address this issue. Relying on the PKNDF, a structural data clustering algorithm is further presented which can find all salient clusters with arbitrary shapes and unbalanced sizes in a noisy or overlapping data set. It builds clusters into the framework of directed trees in graph theory and thereby each point is scanned only once in the process of clustering. Hence, its computational complexity is nearly linear in the size of the input data. Experimental results on both synthetic and real-world datasets have demonstrated its effectiveness and efficiency.

参考文献

相似文献

引证文献

引用本文

丁军娣,马儒宁,陈松灿.基于多项式核的结构化有向树数据聚类算法.软件学报,2008,19(12):3147-3160

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2007-10-20
最后修改日期:2008-03-28
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史