基于近邻传播算法的半监督聚类
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60875031 (国家自然科学基金); the National Basic Research Program of China under Grant No.2007CB311002 (国家重点基础研究发展计划(973)); the Program for New Century Excellent Talents in University of China under Grant No.NECT-06-0078 (新世纪优秀人才支持计划); the Research Fund for the Doctoral Program of Higher Education of the Ministry of Education of China under Grant No.20050004008 (教育部高等学校博士学科点专项科研基金); the Fok Ying-Tong Education Foundation for Young Teachers in the Higher Education Institutions of China under Grant No.101068 (霍英东教育基金会高等院校青年教师基金)


Semi-Supervised Clustering Based on Affinity Propagation Algorithm
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    提出了一种基于近邻传播(affinity propagation,简称AP)算法的半监督聚类方法.AP是在数据点的相似度矩阵的基础上进行聚类.对于规模很大的数据集,AP算法是一种快速、有效的聚类方法,这是其他传统的聚类算法所不能及的,比如:K中心聚类算法.但是,对于一些聚类结构比较复杂的数据集,AP算法往往不能得到很好的聚类结果.使用已知的标签数据或者成对点约束对数据形成的相似度矩阵进行调整,进而达到提高AP算法的聚类性能.实验结果表明,该方法不仅提高了AP对复杂数据的聚类结果,而且在约束对数量较多时,该方法要优于相关比对算法.

    Abstract:

    A semi-supervised clustering method based on affinity propagation (AP) algorithm is proposed in this paper. AP takes as input measures of similarity between pairs of data points. AP is an efficient and fast clustering algorithm for large dataset compared with the existing clustering algorithms, such as K-center clustering. But for the datasets with complex cluster structures, it cannot produce good clustering results. It can improve the clustering performance of AP by using the priori known labeled data or pairwise constraints to adjust the similarity matrix. Experimental results show that such method indeed reaches its goal for complex datasets, and this method outperforms the comparative methods when there are a large number of pairwise constraints.

    参考文献
    [1] Demiriz A, Benneit KP, Embrechts MJ. Semi-Supervised clustering using genetic algorithm. In: Dagli CH, ed. Proc. of the Intelligent Engineering Systems Through Artificial Neural Networks (ANNIE’99). New York: ASME Press, 1999. 809-814.
    [2] Bilenko M, Basu S, Mooney RJ. Integrating constraints and metric learning in semi-supervised clustering. In: Russ G, Dale S, eds. Proc. of the 21st Int’l Conf. on Machine Learning (ICML 2004). Banff: ACM Press, 2004. 81-88.
    [3] Basu S, Bilenko M, Mooney RJ. A probabilistic framework for semi-supervised clustering. In: Won K, Ron K, Johannes G, William D, eds. Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD 2004). Seattle: ACM Press, 2004. 59-68.
    [4] Wagstaff K, Cardie C. Clustering with instance-level constraints. In: Pat L, ed. Proc. of the 17th Int’l Conf. on Machine Learning (ICML 2000). Stanford: Morgan Kaufmann Publishers, 2000. 1103-1110.
    [5] Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Carla EB, Andrea PD, eds. Proc. of the 18th Int’l Conf. on Machine Learning (ICML 2001). Williamstown: Morgan Kaufmann Publishers, 2001. 577-584.
    [6] Basu S, Banerjee A, Mooney RJ. Semi-Supervised clustering by seeding. In: Claude S, Achim GH, eds. Proc. of 19th Int’l Conf. on Machine Learning (ICML 2002). Sydney: Morgan Kaufmann Publishers, 2002. 27-34.
    [7] Kamvar SD, Klein D, Manning CD. Spectral learning. In: Georg G, Toby W, eds. Proc. of the 18th Int’l Joint Conf. on Artificial Intelligence (IJCAI 2003). Morgan Kaufmann Publishers, 2003. 561-566.
    [8] Xu QJ, desJardins M, Wagstaf K. Constrained spectral clustering under a local proximity structure assumption. In: Ingrid R, Zdravko M, eds. Proc. of the 18th Int’l Florida Artificial Intelligence Research Society Conf. (FLAIRS 2005). AAAI Press, 2005. 866-867.
    [9] Klein D, Kamvar SD, Manning CD. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Claude S, Achim GH, eds. Proc. of the 19th Int’l Conf. on Machine Learning (ICML 2002). Sydney: Morgan Kaufmann Publishers, 2002. 307-314.
    [10] Wang L, Bo LF, Jiao LC. Density-Sensitive semi-supervised spectral clustering. Journal of Software, 2007,18(10):2412-2422 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/18/2412.htm
    [11] Xing EP, Ng AY, Jordan MI, Russell S. Distance metric learning with application to clustering with side-information. In: Thrun S, Becker S, Obermayer K, eds. Advances in Neural Information Processing Systems (NIPS 2003). Cambridge: MIT Press, 2003. 505-512.
    [12] Schultz M, Joachims T. Learning a distance metric from relative comparisons. In: Thrun S, Becker S, Obermayer K, eds. Advances in Neural Information Processing Systems (NIPS 2003). Cambridge: MIT Press, 2003. 40-47.
    [13] Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning distance functions using equivalence relations. In: Tom F, Nina M, eds. Proc. of the 20th Int’l Conf. on Machine Learning (ICML 2003). Washington: AAAI Press, 2003. 11-18.
    [14] Tang W, Xiong H, Zhong S, Wu J. Enhancing semi-supervised clustering: A feature projection perspective. In: Pavel B, Rich C, Xindong W, eds. Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD 2007). San Jose: ACM, 2007. 707-716.
    [15] Erick CF, Zeidat N, Zhao ZH. Supervised clustering?Algorithms and benefits. In: Proc. of the 16th IEEE Int’l Conf. on Tools with Artificial Intelligence (ICTAI 2004). Boca Raton: IEEE Press, 2004. 774-776.
    [16] Dettling M, Buhlmann P. Supervised clustering of genes. Genome Biology, 2002,3(12):research0069.1-0069.15.
    [17] Frey BJ, Dueck D. Clustering by passing messages between data points. Science, 2007,315(5814):972-976.
    [18] Mézard M. Where are the exemplars? Science, 2007,315(5814):949-951.
    [19] Shi JB, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8): 888-905.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

肖 宇,于 剑.基于近邻传播算法的半监督聚类.软件学报,2008,19(11):2803-2813

复制
分享
文章指标
  • 点击次数:9528
  • 下载次数: 20153
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2008-03-01
  • 最后修改日期:2008-08-26
文章二维码
您是第19811656位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号