Abstract:In this paper, a Two-Phase Clustering (TPC) for the data sets with complex distribution is proposed. TPC contains two phases. First, the data set is partitioned into some sub-clusters with spherical distribution, and each clustering center represents all the members of its corresponding cluster. Then, by utilizing the outstanding clustering performance of the Manifold Evolutionary Clustering (MEC) for acomplex distributed data, the clustering centers obtained in the first phase are clustered. Finally, based on these two clustering results, the final results are obtained. This algorithm is based on an improved K-means, and the MEC. Manifold distance is introduced in evolutionary clustering to make the algorithm competent for the clustering of complex data sets. At the same time, the novel method reduces the computational cost brought by manifold distance. Experimental results on seven artificial data sets and seven UCI data sets with different structure show that the novel algorithm has the ability to identify clusters with simple or complex, convex, or non-convex distribution efficiently, compared with the genetic algorithm-based clustering, the K-means algorithm, and the manifold evolutionary clustering. Furthermore, TPC outperforms MEC obviously in terms of computational time.