Abstract:Clustering analysis is a hot research topic in the fields of statistics, pattern recognition, and machine learning. Through effective clustering analysis, the intrinsic structure and characteristics of datasets can be well discovered. However, due to the unsupervised learning feature, the existing clustering methods are still facing the problems of unstable and inaccurate on processing different types of datasets. In order to solve these problems, a hybrid clustering algorithm, K-means-AHC, is firstly proposed based on the combination of the K-means algorithm and the hierarchical clustering algorithm. Then, based on the inflexion point detection, a new clustering validity index, DAS (difference of average synthesis degree), is proposed to evaluate the results of the K-means-AHC clustering algorithm. Finally, through the combination of the K-means-AHC algorithm and the DAS index, an effective method of finding the optimal clustering numbers and optimal partitions of datasets is designed. The K-means-AHC algorithm is used to test many kinds of datasets. The experimental results have shown that the proposed algorithm improves the accuracy of clustering analysis while without too much time overhead. At the same time, the new DAS index is superior to the current commonly used clustering validity indexes in the evaluation of clustering results.