一种基于近邻表示的聚类方法
作者:
基金项目:

国家自然科学基金(61422203)


Clustering Method Based on Nearest Neighbors Representation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [20]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    当今社会处在信息急剧膨胀的时代,数据的规模和维度都在不断增大,传统的聚类方法有很多难以适应这一趋势.尤其是移动计算平台的高速发展,其平台自身的特性限制了算法的内存使用规模,因此,以往的很多方法若不进行改进,在这类平台上将无法运行.提出了一种基于近邻表示的聚类方法,该方法基于近邻的思想构造出新的表示形式,这种表示可以进行压缩,因此有效地减少了聚类所需要的存储开销.实现了直接对近邻表示压缩后的数据进行聚类的算法,称为Bit k-means.实验结果表明,该方法取得了较好的效果,在提高准确率的同时,大幅度降低了存储空间开销.

    关键词:近邻;聚类
    Abstract:

    With the rapid expansion of information, scale and dimensionality of data are constantly increasing. Traditional clustering methods are difficult to adapt to this trend. Especially, given the fast development of mobile computing platforms, its properties limit the scale of memory that algorithms can use, so many algorithms cannot run on such platforms without making improvements. This paper proposes a clustering method based on nearest neighbor representation. This method uses the idea of nearest neighbors to construct the new representation. This new representation is compressible, thus effectively reducing the storage cost required for clustering. An algorithm called Bit k-means in implemented to perform clustering directly on the compressed nearest neighbors representation. Experimental results show that the new method achieves higher accuracy and substantially reduces the storage cost.

    参考文献
    [1] Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Bao/Journal of Software, 2008,19(1):48-61 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/19/48.htm[doi:10.3724/SP.J.1001.2008.00048]
    [2] Zhu M. Introduction to Data Mining. Hefei:Press of University of Science and Technology of China, 2002. 138-139 (in Chinese).
    [3] Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall, Inc., 1988. 1-334.
    [4] Gelbard R, Goldman O, Spiegler I. Investigating diversity of clustering methods:An empirical comparison. Data & Knowledge Engineering, 2007,63(1):155-166.[doi:10.1016/j.datak.2007.01.002]
    [5] MacQueen J. Some methods for classification and analysis of multivariate observations. In:Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability. 1967. 281-297.
    [6] Lloyd S. Least squares quantization in PCM. IEEE Trans. on Information Theory, 1982,28(2):129-137.[10.1109/TIT.1982.1056489]
    [7] Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y. Robust face recognition via sparse representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009,31(2):210-227.[doi:10.1109/TPAMI.2008.79]
    [8] Wu JX. Balance support vector machines locally using the structural similarity kernel. In:Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining. 2011. 112-123.[doi:10.1007/978-3-642-20841-6_10]
    [9] Muja M, Lowe DG. Fast approximate nearest neighbors with automatic algorithm configuration. In:Proc. of the Int'l Conf. on Vision Theory and Applications. 2009. 331-340.
    [10] Nene SA, Nayar SK, Murase H. Columbia object image library (COIL-20). Technical Report, CUCS-005-96, New York:Department of Computer Science, Columbia University, 1996.
    [11] Samaria FS, Harter AC. Parameterisation of a stochastic model for human face identification. In:Proc. of the 2nd IEEE Workshop on the Applications of Computer Vision. 1994. 138-142.[doi:10.1109/ACV.1994.341300]
    [12] Belhumeur PN, Hespanha JP, Kriegman D. Eigenfaces vs. fisherfaces:Recognition using class specific linear projection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1997,19(7):711-720.[doi:10.1109/34.598228]
    [13] Nayar SK, Nene SA, Murase H. Columbia object image library (coil 100). Technical Report, CUCS-006-96, New York:Department of Computer Science, Columbia University, 1996.
    [14] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11):2278-2324.[doi:10.1109/5.726791]
    [15] Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2005,52(1):7-21.[doi:10. 1002/nav.20053]
    [16] Kuhn HW. Variants of the Hungarian method for assignment problems. Naval Research Logistics Quarterly, 1956,3(4):253-258.[doi:10.1002/nav.3800030404]
    [17] Munkres J. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial & Applied Mathematics, 1957,5(1):32-38.[doi:10.1137/0105003]
    附中文参考文献:
    [1] 孙吉贵,刘杰,赵连宇.聚类算法研究.软件学报,2008,19(1):48-61. http://www.jos.org.cn/1000-9825/19/48.htm[doi:10.3724/SP.J.1001.2008.00048]
    [2] 朱明.数据挖掘导论.合肥:中国科学技术大学出版社,2002.138-139.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

周国兵,吴建鑫,周嵩.一种基于近邻表示的聚类方法.软件学报,2015,26(11):2847-2855

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-05-24
  • 最后修改日期:2015-08-26
  • 在线发布日期: 2015-11-04
文章二维码
您是第19728283位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号