基于成对约束的判别型半监督聚类分析
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60505004, 60773061 (国家自然科学基金)


Discriminative Semi-Supervised Clustering Analysis with Pairwise Constraints
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能.

    Abstract:

    Most existing semi-supervised clustering algorithms with pairwise constraints neither solve the problem of violation of pairwise constraints effectively, nor handle the high-dimensional data simultaneously. This paper presents a discriminative semi-supervised clustering analysis algorithm with pairwise constraints, called DSCA, which effectively utilizes supervised information to integrate dimensionality reduction and clustering. The proposed algorithm projects the data onto a low-dimensional manifold, where pairwise constraints based K-means algorithm is simultaneously used to cluster the data. Meanwhile, pairwise constraints based K-means algorithm presented in this paper reduces the computational complexity of constraints based semi-supervised algorithm and resolve the problem of violating pairwise constraints in the existing semi-supervised clustering algorithms. Experimental results on real-world datasets demonstrate that the proposed algorithm can effectively deal with high-dimensional data and provide an appealing clustering performance compared with the state-of-the-art semi-supervised algorithm.

    参考文献
    [1] Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 2005,6(5):937-965.
    [2] Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Brodley CE, Danyluk AP, eds. Proc. of the 18th Int’l Conf. on Machine Learning. Williamstown: Morgan Kaufmann Publishers, 2001. 577-584.
    [3] Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning distance functions using equivalence relations. In: Fawcett T, Mishra N, eds. Proc. of the 20th Int’l Conf. on Machine Learning. Washington: Morgan Kaufmann Publishers, 2003. 11-18.
    [4] Basu S, Banerjee A, Mooney RJ. Semi-Supervised clustering by seeding. In: Sammut C, Hoffmann AG, eds. Proc. of the 19th Int’l Conf. on Machine Learning. Sydney: Morgan Kaufmann Publishers, 2002. 19-26.
    [5] Xing EP, Ng AY, Jordan MI, Russell S. Distance metric learning with application to clustering with side-information. In: Becher S, Thrun S, Obermayer K, eds. Proc. of the 16th Annual Conf. on Neural Information Processing System. Cambridge: MIT Press, 2003. 505-512.
    [6] Basu S, Banerjee A, Mooney RJ. A probabilistic framework for semi-supervised clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D, eds. Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004. 59-68.
    [7] Bilenko M, Basu S, Mooney RJ. Integrating constraints and metric learning in semi-supervised clustering. In: Brodley CE, ed. Proc. of the 21st Int’l Conf. on Machine Learning. New York: ACM Press, 2004. 81-88.
    [8] Tang W, Xiong H, Zhong S, Wu J. Enhancing semi-supervised clustering: a feature projection perspective. In: Berkhin P, Caruana R, Wu XD, eds. Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 707-716.
    [9] Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Jonker W, Petkovic M, eds. Proc. of the SIAM Int’l Conf. on Data Mining. Cambridge: MIT Press, 2004. 333-344.
    [10] Yan B, Domeniconi C. An adaptive kernel method for semi-supervised clustering. In: Fürnkranz J, Scheffer T, Spiliopoulou M, eds. Proc. of the 17th European Conf. on Machine Learning. Berlin: Sigma Press, 2006. 18-22.
    [11] Yeung DY, Chang H. Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition, 2006,39(5):1007-1010.
    [12] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “Nearest Neighbors Meaningful”? In: Beeri C, Buneman P, eds. Proc. of the Int’l Conf. on Database Theory. New York: ACM Press, 1999. 217-235.
    [13] Ding CH, Li T. Adaptive dimension reduction using discriminant analysis and K-means clustering. In: Ghahramani Z, ed. Proc. of the 19th Int’l Conf. on Machine Learning. New York: ACM Press, 2007. 521-528.
    [14] Zhang DQ, Zhou ZH, Chen SC. Semi-Supervised dimensionality reduction. In: Mandoiu I, Zelikovsky A, eds. Proc. of the 7th SIAM Int’l Conf. on Data Mining. Cambridge: MIT Press, 2007. 629-634.
    [15] Ye JP, Zhao Z, Liu H. Adaptive distance metric learning for clustering. In: Bishop CM, Frey B, eds. Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. Madison: IEEE Computer Society Press, 2007. 1-7.
    [16] Chen JH, Zhao Z, Ye JP, Liu H. Nonlinear adaptive distance metric learning for clustering. In: Berkhin P, Caruana R, Wu XD, eds. Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 123-132.
    [17] Saul LK, Roweis ST. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 2003,4(3):119-155.
    [18] Schultz M, Joachims T. Learning a distance metric from relative comparisons. In: Thrun S, Saul LK, Sch?lkopf B, eds. Proc. of the 17th Annual Conf. on Neural Information Processing System. Cambridge: MIT Press, 2004. 41-48.
    [19] De la Torre F, Kanade T. Discriminative cluster analysis. In: William WC, Andrew M, eds. Proc. of the 19th Int’l Conf. on Machine Learning. New York: ACM Press, 2006. 241-248.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析.软件学报,2008,19(11):2791-2802

复制
分享
文章指标
  • 点击次数:8441
  • 下载次数: 12456
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2008-01-08
  • 最后修改日期:2008-08-26
文章二维码
您是第20251299位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号