A Constrained Partition Model and K-Means Algorithm
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [10]
  • |
  • Related [20]
  • |
  • Cited by [7]
  • | |
  • Comments
    Abstract:

    Incorporating instance-level constraints into K-means algorithm can improve the accuracy of clustering. As the partition generated is represented by K centers and a cluster is represented by only one center, the representation model prevents further improvement of the accuracy. Based upon the instance-level constraints, two types of constraints between instance and class are presented, three types of constraints between classes are presented too, and the constrained partition model is presented and analyzed. In this model, based upon the constraints between sub-clusters, more centers are utilized to represent one cluster, which makes the representation of partition flexible and precise. An algorithm CKS (constrained K-means with subsets) is presented to generate the constrained partition. The experiments on three UCI datasets: Glass, Iris and Sonar, suggest that CKS is remarkably superior to COP-K-means in accuracy and robustness, and is better than CCL too. The time for running CKS is neither significantly influenced by the number of constraints compared with COP-K-means, nor remarkably increased when the number of instances is increased compared with CCL.

    Reference
    [1]Jain A, Murty M, Flynn P. Data clustering: A review. ACM Computing Surveys, 1999,31(3):264-323.
    [2]Wagstaff K. Intelligent clustering with instance-level constraints [Ph.D. Thesis]. Cornell University, 2002.
    [3]Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Langley P, ed. Proc. of the 17th Int'l Conf. on Machine Learning (ICML 2000). San Francisco: Morgan Kaufmann Publishers, 2000. 1103-1110.
    [4]Basu S, Banerjee A, Mooney R. Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering. In: Fawcett T, Mishra N, eds. Proc. of the 20th Int'l Conf. on Machine Learning (ICML 2003) Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. New Orleans: AAAI Press, 2003.42-49.
    [5]Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Brodley C, Danyluk AP, eds. Proc. of the 18th Int'l Conf. on Machine Learning (ICML 2001). San Francisco: Morgan Kaufmann Publishers, 2001.577-584.
    [6]Blake C, Merz J. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.htmi
    [7]Klein D, Kamvar S, Manning C. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Sammut C, Hoffmann A, eds. Proc. of the 19th Int'l Conf. on Machine Learning (ICML 2002). San Francisco:Morgan Kaufmann Publishers, 2002. 307-314.
    [8]Bottou L, Bengio Y. Convergence properties of the K-means algorithm. In: Tesauro G, Touretzky DS, Leen TK, eds. Advances in Neural Information Processing Systems 7. Cambridge: MIT Press, 1995.585-592.
    [9]Halkidi M, Batistakis Y, Vazirgiannis M. Cluster validity methods: Part Ⅰ. SIGMOD Record, 2002,31(2):40-45.
    [10]Basu S, Banerjee A, Mooney R. Semi-Supervised clustering by seeding. In: Sammut C, Hoffmann A, eds. Proc. of the 19th Int'l Conf. on Machine Learning (ICML 2002). San Francisco: Morgan Kaufmann Publishers, 2002. 19-26.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

何振峰,熊范纶.结合限制的分隔模型及K-Means算法.软件学报,2005,16(5):799-809

Copy
Share
Article Metrics
  • Abstract:4339
  • PDF: 5610
  • HTML: 0
  • Cited by: 0
History
  • Received:January 09,2004
  • Revised:March 17,2004
You are the first2045191Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063