聚类算法研究
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60473003, 60573073 (国家自然科学基金); the Major Research Program of National Natural Science Foundation of China under Grant No.60496321 (国家自然科学基金重大项目)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [37]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    对近年来聚类算法的研究现状与新进展进行归纳总结.一方面对近年来提出的较有代表性的聚类算法,从算法思想、关键技术和优缺点等方面进行分析概括;另一方面选择一些典型的聚类算法和一些知名的数据集,主要从正确率和运行效率两个方面进行模拟实验,并分别就同一种聚类算法、不同的数据集以及同一个数据集、不同的聚类算法的聚类情况进行对比分析.最后通过综合上述两方面信息给出聚类分析的研究热点、难点、不足和有待解决的一些问题.上述工作将为聚类分析和数据挖掘等研究提供有益的参考.

    关键词:聚类;算法;实验
    Abstract:

    The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.

    参考文献
    [1] Jain AK, Flynn PJ. Image segmentation using clustering. In: Ahuja N, Bowyer K, eds. Advances in Image Understanding: A Festchrift for Azriel Rosenfeld. Piscataway: IEEE Press, 1996. 65-83.
    [2] Cades I, Smyth P, Mannila H. Probabilistic modeling of transactional data with applications to profiling, visualization and prediction, sigmod. In: Proc. of the 7th ACM SIGKDD. San Francisco: ACM Press, 2001. 37-46. http://www.sigkdd.org/kdd2001/
    [3] Jain AK, Murty MN, Flynn PJ. Data clustering: A review. ACM Computing Surveys, 1999,31(3):264-323.
    [4] Gelbard R, Goldman O, Spiegler I. Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 2007,63(1):155-166.
    [5] Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series, 1988. 1-334.
    [6] Jain AK, Duin RPW, Mao JC. Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(1):4-37.
    [7] Sambasivam S, Theodosopoulos N. Advanced data clustering methods of mining Web documents. Issues in Informing Science and Information Technology, 2006,(3):563-579.
    [8] Marques JP, Written; Wu YF, Trans. Pattern Recognition Concepts, Methods and Applications. 2nd ed., Beijing: Tsinghua University Press, 2002. 51-74 (in Chinese).
    [9] Fred ALN, Leit?o JMN. Partitional vs hierarchical clustering using a minimum grammar complexity approach. In: Proc. of the SSPR&SPR 2000. LNCS 1876, 2000. 193?202. http://www.sigmod.org/dblp/db/conf/sspr/sspr2000.html
    [10] Gelbard R, Spiegler I. Hempel's raven paradox: A positive approach to cluster analysis. Computers and Operations Research, 2000, 27(4):305-320.
    [11] Zhang B, Srihari SN. Properties of binary vector dissimilarity measures. In: Proc. of the JCIS CVPRIP 2003. 2003. 26-30. http://www.ee.duke.edu/JCIS/
    [12] Kumar P, Krishna PR, Bapi RS, De SK. Rough clustering of sequential data. Data & Knowledge Engineering, 2007,3(2):183-199.
    [13] Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, 1997. 146-151. http://www.informatik.uni-trier.de/~ley/db/ conf/sigmod/sigmod97.html
    [14] Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge, Discovery II, 1998,(2):283-304.
    [15] Huang Z, Ng MA. Fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. on Fuzzy Systems, 1999,7(4):446-452.
    [16] Chaturvedi AD, Green PE, Carroll JD. K-modes clustering. Journal of Classification, 2001,18(1):35-56.
    [17] Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 1974,61(2): 215-231.
    [18] Huang ZX, Michael K. A note on K-modes clustering. Journal of Classification, 2003,20(2):257-26.
    [19] Sun Y, Zhu QM, Chen ZX. An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognition Letters, 2002,23(7):875-884.
    [20] Bradley PS, Fayyad UM. Refining initial points for k-means clustering. In: Proc. of the 15th Internet Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1998. 91?99. http://www.cs.wisc.edu/icml98/
    [21] http://www.ics.uci.edu/~mlearn/databases/
    [22] Ding C, He X. K-Nearest-Neighbor in data clustering: Incorporating local information into global optimization. In: Proc. of the ACM Symp. on Applied Computing. Nicosia: ACM Press, 2004. 584-589. http://www.acm.org/conferences/sac/sac2004/
    [23] Lyer NS, Kandel A, Schneider M. Feature-Based fuzzy classification for interpretation of mammograms. Fuzzy Sets System, 2000, 114(2):271-280.
    [24] Yang MS, Hu YJ, Lin KCR, Lin CCL. Segmenttation techniques for tissue differentiation in MRI of ophthalmology using fuzzy clustering algorithm. Journal of Magnetic Resonance Imaging, 2002,(20):173-179.
    [25] Li J, Gao XB, Jiao LC. A new feature weighted fuzzy clustering algorithm. ACTA Electronica Sinica, 2006,34(1):412-420 (in Chinese with English abstract).
    [26] Kononenko I. Estimating attributes: Analysis and extensions of relief. In: Proc, of the 17th European Conf. On Machine Learning. LNCS 784, 1994. 171-182.
    [27] Cai WL, Chen SC, Zhang DQ. Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognition, 2007,40(3):825-833.
    [28] Harel D, Koren Y. Clustering spatial data using random walks. In: Proc. of the 7th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining. New York: ACM Press, 2001. 281-286. http://www.sigkdd.org/kdd2001/
    [29] Karypis G, Han EH, Kumar V. CHANELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 2(8):68-75.
    [30] Estivill-Castro V, Lee I. AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets. In: Abrahart J, Carlisle BH, eds. Proc. of the 5th Int'l Conf. on Geocomputation. 2000. 23-25. http://www.geocomputation.org/ 2000/index.html
    [31] Li YJ. A clustering algorithm based on maximal (-distant subtrees. Pattern Recognition, 2007,40(5):1425-1431.
    [32] Zhao YC, Song J. GDILC: A grid-based density isoline clustering algorithm. In: Zhong YX, Cui S, Yang Y, eds. Proc. of the Internet Conf. on Info-Net. Beijing: IEEE Press, 2001. 140-145. http://ieeexplore.ieee.org/iel5/7719/21161/00982709.pdf
    [33] Ma WM, Chow E, Tommy WS. A new shifting grid clustering algorithm. Pattern Recognition, 2004,37(3):503-514.
    [34] Pilevar AH, Sukumar M. GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognition Letters, 2005,26(7):999-1010.
    [35] Nanni M, Pedreschi D. Time-Focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems, 2006, 27(3):267-289.
    [36] Birant D, Kut A. ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering, 2007,60(1): 208-221.
    [37] Tsai CF, Tsai CW, Wu HC, Yang T. ACODF: A novel data clustering approach for data mining in large databases. Journal of Systems and Software, 2004,73(1):133-145.
    相似文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

孙吉贵,刘 杰,赵连宇.聚类算法研究.软件学报,2008,19(1):48-61

复制
分享
文章指标
  • 点击次数:28795
  • 下载次数: 64216
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2007-04-24
  • 最后修改日期:2007-08-03
文章二维码
您是第19796428位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号