Density Peak Clustering Algorithm Based on K-nearest Neighbors and Optimized Allocation Strategy
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The density peak clustering (DPC) algorithm is a simple and effective clustering analysis algorithm. However, in real-world practical applications, it is difficult for DPC to select the correct cluster centers for datasets with large differences of density among clusters or multi-density peaks in clusters. Furthermore, the allocation method of points in DPC has a domino effect. To address these issues, a density peak clustering algorithm based on the K-nearest neighbors (KNN) and the optimized allocation strategy was proposed. First, the candidate cluster centers using the KNN, densities of points, and boundary points were determined. The path distance was defined to reflect the similarity between the candidate cluster centers, based on which, the density factor and distance factor were proposed to quantify the possibility of candidate cluster centers as cluster centers, and then the cluster centers were determined. Second, to improve the allocation precision of points, according to the shared nearest neighbors, high density nearest neighbor, density difference, and distance between KNN, the similarity measures were constructed, and then some concepts of the neighborhood, similarity set, and similarity domain were proposed to assist in the allocation of points. The initial clustering results were determined according to the similarity domains and boundary points, and then the intermediate clustering results were achieved based on the cluster centers. Finally, according to the intermediate clustering results and similarity set, the clusters were divided into multiple layers from the cluster centers to the cluster boundaries, for which the allocation strategies of points were designed, respectively. To determine the allocation order of points in the specific layer, the positive value was presented based on the similarity domain and positive domain. The point was allocated to the dominant cluster in its positive domain. Thus, the final clustering results were obtained. The experimental results on 11 synthetic datasets and 27 real datasets demonstrate that the proposed algorithm has sound clustering performance in metrics of the purity, F-measure, accuracy, Rand index, adjusted Rand index, and normalized mutual information when compared with the state-of-the-art DPC algorithms.

    Reference
    Related
    Cited by
Get Citation

孙林,秦小营,徐久成,薛占熬.基于K近邻和优化分配策略的密度峰值聚类算法.软件学报,2022,33(4):1390-1411

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 10,2021
  • Revised:July 16,2021
  • Adopted:
  • Online: October 26,2021
  • Published: April 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063