Approximate Weighted Kernel k-means for Large-Scale Spectral Clustering
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Spectral clustering is based on algebraic graph theory. It turns the clustering problem into the graph partitioning problem. To solve the graph cut objective function, the properties of the Rayleigh quotient are usually utilized to map the original data points into a lower dimensional eigen-space by calculating the eigenvectors of Laplacian matrix and then conducting the clustering in the new space. However, during the process of spectral clustering, the space complexity of storing similarity matrix is O(n2), and the time complexity of the eigen-decomposition of Laplacian matrix is usually O(n3). Such complexity is unacceptable when dealing with large-scale data sets. It can be proved that both normalized cut graph clustering and weighted kernel k-means are equivalent to the matrix trace maximization problem, which suggests that weighted kernel k-means algorithm can be used to optimize the objective function of normalized cut without the eigen-decomposition of Laplacian matrix. Nonetheless, weighted kernel k-means algorithm needs to calculate the kernel matrix, and its space complexity is still O(n2). To address this challenge, this study proposes an approximate weighted kernel k-means algorithm in which only part of the kernel matrix is used to solve big data spectral clustering problem. Theoretical analysis and experimental comparison show that approximate weighted kernel k-means has similar clustering performance with weighted kernel k-means algorithm, but its time and space complexity is greatly reduced.

    Reference
    [1] Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Bao/Journal of Software, 2008,19(1):48-61 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/19/48.htm[doi:10.3724/SP.J.1001.2008.00048]
    [2] Schleif FM, Zhu XB, Gisbrecht A, Hammer B. Fast approximated relational and kernel clustering. In:Proc. of the 21st Int'l Conf. on Pattern Recognition. 2012. 1229-1232.
    [3] Jia HJ, Ding SF, Xu XZ, Nie R. The latest research progress on spectral clustering. Neural Computing and Applications, 2014, 24(7-8):1477-1486.[doi:10.1007/s00521-013-1439-2]
    [4] Chan PK, Schlag MDF, Zien JY. Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 1994,13(9):1088-1096.[doi:10.1109/43.310898]
    [5] Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8):888-905.[doi:10.1109/34.868688]
    [6] Rebagliati N, Verri A. Spectral clustering with more than k eigenvectors. Neurocomputing, 2011,74(9):1391-1401.[doi:10.1016/j.neucom.2010.12.008]
    [7] Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007,17(4):395-416.[doi:10.1007/s11222-007-9033-z]
    [8] Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the Nyström method. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004,26(2):214-225.[doi:10.1109/TPAMI.2004.1262185]
    [9] Kumar S, Mohri M, Talwalkar A. Sampling methods for the Nyström method. Journal of Machine Learning Research, 2012,13(1):981-1006.
    [10] Si S, Hsieh CJ, Dhillon I. Memory efficient kernel approximation. In:Proc. of the 31st Int'l Conf. on Machine Learning. 2014. 701-709.
    [11] Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY. Parallel spectral clustering in distributed systems. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011,33(3):568-586.[doi:10.1109/TPAMI.2010.88]
    [12] Dhillon IS, Guan Y, Kulis B. Weighted graph cuts without eigenvectors:A multilevel approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007,29(11):1944-1957.[doi:10.1109/TPAMI.2007.1115]
    [13] Yu S, Tranchevent LC, Liu XH, Glänzel W, Suykens JAK, De Moor B, Moreau Y. Optimized data fusion for kernel k-means clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2012,34(5):1031-1039.[doi:10.1109/TPAMI.2011.255]
    [14] Dhillon IS, Guan Y, Kulis B. Kernel k-means, spectral clustering and normalized cuts. In:Proc. of the 10th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2004. 551-556.[doi:10.1145/1014052.1014118]
    [15] Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In:Proc. of the Computational Learning Theory. 2001. 416-426.[doi:10.1007/3-540-44581-1_27]
    [16] Zhang R, Rudnicky AI. A large scale clustering scheme for kernel k-means. In:Proc. of the 16th Int'l Conf. on Pattern Recognition. 2002. 289-292.[doi:10.1109/ICPR.2002.1047453]
    [17] Ding SF, Jia HJ, Shi ZZ. Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis. Ruan Jian Xue Bao/Journal of Software, 2014,25(9):2037-2049 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4643.htm[doi:10.13328/j.cnki.jos.004643]
    [18] Roth V, Laub J, Kawanabe M, Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2003,25(12):1540-1551.[doi:10.1109/TPAMI.2003.1251147]
    [19] Ng AY, Jordan MI, Weiss Y. On spectral clustering:Analysis and an algorithm. In:Proc. of the Advances in Neural Information Processing Systems. 2002. 849-856.
    [20] Bühler T, Hein M. Spectral clustering based on the graph p-Laplacian. In:Proc. of the 26th Annual Int'l Conf. on Machine Learning. 2009. 81-88.[doi:10.1145/1553374.1553385]
    [21] Cai D, He XF, Han JW, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011,33(8):1548-1560.[doi:10.1109/TPAMI.2010.231]
    [22] Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M. Fuzzy c-means algorithms for very large data. IEEE Trans. on Fuzzy Systems, 2012,20(6):1130-1146.[doi:10.1109/TFUZZ.2012.2201485]
    [23] Kvalseth TO. Entropy and correlation:Some comments. IEEE Trans. on Systems, Man and Cybernetics, 1987,17(3):517-519.[doi:10.1109/TSMC.1987.4309069]
    附中文参考文献:
    [1] 孙吉贵,刘杰,赵连宇.聚类算法研究.软件学报,2008,19(1):48-61. http://www.jos.org.cn/1000-9825/19/48.htm[doi:10.3724/SP.J.1001.2008.00048]
    [17] 丁世飞,贾洪杰,史忠植.基于自适应Nyström采样的大数据谱聚类算法.软件学报,2014,25(9):2037-2049. http://www.jos.org.cn/1000-9825/4643.htm[doi:10.13328/j.cnki.jos.004643]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权核k-means算法.软件学报,2015,26(11):2836-2846

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:February 15,2015
  • Revised:August 26,2015
  • Online: November 04,2015
You are the first2038061Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063