Journal of Software
1000-9825
2023
34
12
5629
5648
10.13328/j.cnki.jos.006756
article
基于代表点与K近邻的密度峰值聚类算法
Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors
密度峰值聚类(density peaks clustering, DPC)是一种基于密度的聚类算法, 该算法可以直观地确定类簇数量, 识别任意形状的类簇, 并且自动检测、排除异常点. 然而, DPC仍存在些许不足: 一方面, DPC算法仅考虑全局分布, 在类簇密度差距较大的数据集聚类效果较差; 另一方面, DPC中点的分配策略容易导致“多米诺效应”. 为此, 基于代表点(representative points)与K近邻(K-nearest neighbors, KNN)提出了RKNN-DPC算法. 首先, 构造了K近邻密度, 再引入代表点刻画样本的全局分布, 提出了新的局部密度; 然后, 利用样本的K近邻信息, 提出一种加权的K近邻分配策略以缓解“多米诺效应”; 最后, 在人工数据集和真实数据集上与5种聚类算法进行了对比实验, 实验结果表明, 所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果.
Density peaks clustering (DPC) is a density-based clustering algorithm that can intuitively determine the number of clusters, identify clusters of any shape, and automatically detect and exclude abnormal points. However, DPC still has some shortcomings: The DPC algorithm only considers the global distribution, and the clustering performance is poor for datasets with large cluster density differences. In addition, the point allocation strategy of DPC is likely to cause a Domino effect. Hence, this study proposes a DPC algorithm based on representative points and K-nearest neighbors (KNN), namely, RKNN-DPC. First, the KNN density is constructed, and the representative points are introduced to describe the global distribution of samples and propose a new local density. Then, the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect. Finally, a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets. The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
聚类分析;密度峰值聚类;代表点;K近邻 (KNN)
cluster analysis;density peaks clustering (DPC);representative point;K-nearest neighbors (KNN)
张清华,周靖鹏,代永杨,王国胤
ZHANG Qing-Hua, ZHOU Jing-Peng, DAI Yong-Yang, WANG Guo-Yin
jos/article/abstract/6756