Most existing semi-supervised clustering algorithms with pairwise constraints neither solve the problem of violation of pairwise constraints effectively, nor handle the high-dimensional data simultaneously. This paper presents a discriminative semi-supervised clustering analysis algorithm with pairwise constraints, called DSCA, which effectively utilizes supervised information to integrate dimensionality reduction and clustering. The proposed algorithm projects the data onto a low-dimensional manifold, where pairwise constraints based K-means algorithm is simultaneously used to cluster the data. Meanwhile, pairwise constraints based K-means algorithm presented in this paper reduces the computational complexity of constraints based semi-supervised algorithm and resolve the problem of violating pairwise constraints in the existing semi-supervised clustering algorithms. Experimental results on real-world datasets demonstrate that the proposed algorithm can effectively deal with high-dimensional data and provide an appealing clustering performance compared with the state-of-the-art semi-supervised algorithm.
[1] Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 2005,6(5):937-965.
[2] Wagstaff K, Cardie C, Rogers S, Schroedl S. Constrained K-means clustering with background knowledge. In: Brodley CE, Danyluk AP, eds. Proc. of the 18th Int’l Conf. on Machine Learning. Williamstown: Morgan Kaufmann Publishers, 2001. 577-584.
[3] Bar-Hillel A, Hertz T, Shental N, Weinshall D. Learning distance functions using equivalence relations. In: Fawcett T, Mishra N, eds. Proc. of the 20th Int’l Conf. on Machine Learning. Washington: Morgan Kaufmann Publishers, 2003. 11-18.
[4] Basu S, Banerjee A, Mooney RJ. Semi-Supervised clustering by seeding. In: Sammut C, Hoffmann AG, eds. Proc. of the 19th Int’l Conf. on Machine Learning. Sydney: Morgan Kaufmann Publishers, 2002. 19-26.
[5] Xing EP, Ng AY, Jordan MI, Russell S. Distance metric learning with application to clustering with side-information. In: Becher S, Thrun S, Obermayer K, eds. Proc. of the 16th Annual Conf. on Neural Information Processing System. Cambridge: MIT Press, 2003. 505-512.
[6] Basu S, Banerjee A, Mooney RJ. A probabilistic framework for semi-supervised clustering. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D, eds. Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004. 59-68.
[7] Bilenko M, Basu S, Mooney RJ. Integrating constraints and metric learning in semi-supervised clustering. In: Brodley CE, ed. Proc. of the 21st Int’l Conf. on Machine Learning. New York: ACM Press, 2004. 81-88.
[8] Tang W, Xiong H, Zhong S, Wu J. Enhancing semi-supervised clustering: a feature projection perspective. In: Berkhin P, Caruana R, Wu XD, eds. Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 707-716.
[9] Basu S, Banerjee A, Mooney RJ. Active semi-supervision for pairwise constrained clustering. In: Jonker W, Petkovic M, eds. Proc. of the SIAM Int’l Conf. on Data Mining. Cambridge: MIT Press, 2004. 333-344.
[10] Yan B, Domeniconi C. An adaptive kernel method for semi-supervised clustering. In: Fürnkranz J, Scheffer T, Spiliopoulou M, eds. Proc. of the 17th European Conf. on Machine Learning. Berlin: Sigma Press, 2006. 18-22.
[11] Yeung DY, Chang H. Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition, 2006,39(5):1007-1010.
[12] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “Nearest Neighbors Meaningful”? In: Beeri C, Buneman P, eds. Proc. of the Int’l Conf. on Database Theory. New York: ACM Press, 1999. 217-235.
[13] Ding CH, Li T. Adaptive dimension reduction using discriminant analysis and K-means clustering. In: Ghahramani Z, ed. Proc. of the 19th Int’l Conf. on Machine Learning. New York: ACM Press, 2007. 521-528.
[14] Zhang DQ, Zhou ZH, Chen SC. Semi-Supervised dimensionality reduction. In: Mandoiu I, Zelikovsky A, eds. Proc. of the 7th SIAM Int’l Conf. on Data Mining. Cambridge: MIT Press, 2007. 629-634.
[15] Ye JP, Zhao Z, Liu H. Adaptive distance metric learning for clustering. In: Bishop CM, Frey B, eds. Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. Madison: IEEE Computer Society Press, 2007. 1-7.
[16] Chen JH, Zhao Z, Ye JP, Liu H. Nonlinear adaptive distance metric learning for clustering. In: Berkhin P, Caruana R, Wu XD, eds. Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2007. 123-132.
[17] Saul LK, Roweis ST. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 2003,4(3):119-155.
[18] Schultz M, Joachims T. Learning a distance metric from relative comparisons. In: Thrun S, Saul LK, Sch?lkopf B, eds. Proc. of the 17th Annual Conf. on Neural Information Processing System. Cambridge: MIT Press, 2004. 41-48.
[19] De la Torre F, Kanade T. Discriminative cluster analysis. In: William WC, Andrew M, eds. Proc. of the 19th Int’l Conf. on Machine Learning. New York: ACM Press, 2006. 241-248.