Abstract:Incorporating instance-level constraints into K-means algorithm can improve the accuracy of clustering. As the partition generated is represented by K centers and a cluster is represented by only one center, the representation model prevents further improvement of the accuracy. Based upon the instance-level constraints, two types of constraints between instance and class are presented, three types of constraints between classes are presented too, and the constrained partition model is presented and analyzed. In this model, based upon the constraints between sub-clusters, more centers are utilized to represent one cluster, which makes the representation of partition flexible and precise. An algorithm CKS (constrained K-means with subsets) is presented to generate the constrained partition. The experiments on three UCI datasets: Glass, Iris and Sonar, suggest that CKS is remarkably superior to COP-K-means in accuracy and robustness, and is better than CCL too. The time for running CKS is neither significantly influenced by the number of constraints compared with COP-K-means, nor remarkably increased when the number of instances is increased compared with CCL.