聚类算法研究

微信服务号

微信订阅号

2025年4月7日 6:03 星期一

首页 > 过刊浏览>2008年第19卷第1期 >48-61

聚类算法研究
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        孙吉贵孙吉贵
吉林大学 计算机科学与技术学院,吉林 长春 130012; 符号计算与知识工程教育部重点实验室,吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
刘 杰刘 杰
吉林大学 计算机科学与技术学院,吉林 长春 130012; 符号计算与知识工程教育部重点实验室,吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
赵连宇赵连宇
吉林大学 计算机科学与技术学院,吉林 长春 130012; 符号计算与知识工程教育部重点实验室,吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60473003, 60573073 (国家自然科学基金); the Major Research Program of National Natural Science Foundation of China under Grant No.60496321 (国家自然科学基金重大项目)

Clustering Algorithms Research

Author:

SUN Ji-Gui
SUN Ji-Gui

在期刊界中查找
在百度中查找
在本站中查找
LIU Jie1
LIU Jie1

在期刊界中查找
在百度中查找
在本站中查找
ZHAO Lian-Yu
ZHAO Lian-Yu

在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [37]

相似文献

引证文献

资源附件

文章评论

摘要:

对近年来聚类算法的研究现状与新进展进行归纳总结.一方面对近年来提出的较有代表性的聚类算法,从算法思想、关键技术和优缺点等方面进行分析概括;另一方面选择一些典型的聚类算法和一些知名的数据集,主要从正确率和运行效率两个方面进行模拟实验,并分别就同一种聚类算法、不同的数据集以及同一个数据集、不同的聚类算法的聚类情况进行对比分析.最后通过综合上述两方面信息给出聚类分析的研究热点、难点、不足和有待解决的一些问题.上述工作将为聚类分析和数据挖掘等研究提供有益的参考.

关键词:聚类;算法;实验

Abstract:

The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.

Key words:clustering; algorithm; experiment

参考文献

[1] Jain AK, Flynn PJ. Image segmentation using clustering. In: Ahuja N, Bowyer K, eds. Advances in Image Understanding: A Festchrift for Azriel Rosenfeld. Piscataway: IEEE Press, 1996. 65-83.

[2] Cades I, Smyth P, Mannila H. Probabilistic modeling of transactional data with applications to profiling, visualization and prediction, sigmod. In: Proc. of the 7th ACM SIGKDD. San Francisco: ACM Press, 2001. 37-46. http://www.sigkdd.org/kdd2001/

[3] Jain AK, Murty MN, Flynn PJ. Data clustering: A review. ACM Computing Surveys, 1999,31(3):264-323.

[4] Gelbard R, Goldman O, Spiegler I. Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 2007,63(1):155-166.

[5] Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series, 1988. 1-334.

[6] Jain AK, Duin RPW, Mao JC. Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(1):4-37.

[7] Sambasivam S, Theodosopoulos N. Advanced data clustering methods of mining Web documents. Issues in Informing Science and Information Technology, 2006,(3):563-579.

[8] Marques JP, Written; Wu YF, Trans. Pattern Recognition Concepts, Methods and Applications. 2nd ed., Beijing: Tsinghua University Press, 2002. 51-74 (in Chinese).

[9] Fred ALN, Leit?o JMN. Partitional vs hierarchical clustering using a minimum grammar complexity approach. In: Proc. of the SSPR&SPR 2000. LNCS 1876, 2000. 193?202. http://www.sigmod.org/dblp/db/conf/sspr/sspr2000.html

[10] Gelbard R, Spiegler I. Hempel's raven paradox: A positive approach to cluster analysis. Computers and Operations Research, 2000, 27(4):305-320.

[11] Zhang B, Srihari SN. Properties of binary vector dissimilarity measures. In: Proc. of the JCIS CVPRIP 2003. 2003. 26-30. http://www.ee.duke.edu/JCIS/

[12] Kumar P, Krishna PR, Bapi RS, De SK. Rough clustering of sequential data. Data & Knowledge Engineering, 2007,3(2):183-199.

[13] Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, 1997. 146-151. http://www.informatik.uni-trier.de/～ley/db/ conf/sigmod/sigmod97.html

[14] Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge, Discovery II, 1998,(2):283-304.

[15] Huang Z, Ng MA. Fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. on Fuzzy Systems, 1999,7(4):446-452.

[16] Chaturvedi AD, Green PE, Carroll JD. K-modes clustering. Journal of Classification, 2001,18(1):35-56.

[17] Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 1974,61(2): 215-231.

[18] Huang ZX, Michael K. A note on K-modes clustering. Journal of Classification, 2003,20(2):257-26.

[19] Sun Y, Zhu QM, Chen ZX. An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognition Letters, 2002,23(7):875-884.

[20] Bradley PS, Fayyad UM. Refining initial points for k-means clustering. In: Proc. of the 15th Internet Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1998. 91?99. http://www.cs.wisc.edu/icml98/

[21] http://www.ics.uci.edu/～mlearn/databases/

[22] Ding C, He X. K-Nearest-Neighbor in data clustering: Incorporating local information into global optimization. In: Proc. of the ACM Symp. on Applied Computing. Nicosia: ACM Press, 2004. 584-589. http://www.acm.org/conferences/sac/sac2004/

[23] Lyer NS, Kandel A, Schneider M. Feature-Based fuzzy classification for interpretation of mammograms. Fuzzy Sets System, 2000, 114(2):271-280.

[24] Yang MS, Hu YJ, Lin KCR, Lin CCL. Segmenttation techniques for tissue differentiation in MRI of ophthalmology using fuzzy clustering algorithm. Journal of Magnetic Resonance Imaging, 2002,(20):173-179.

[25] Li J, Gao XB, Jiao LC. A new feature weighted fuzzy clustering algorithm. ACTA Electronica Sinica, 2006,34(1):412-420 (in Chinese with English abstract).

[26] Kononenko I. Estimating attributes: Analysis and extensions of relief. In: Proc, of the 17th European Conf. On Machine Learning. LNCS 784, 1994. 171-182.

[27] Cai WL, Chen SC, Zhang DQ. Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognition, 2007,40(3):825-833.

[28] Harel D, Koren Y. Clustering spatial data using random walks. In: Proc. of the 7th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining. New York: ACM Press, 2001. 281-286. http://www.sigkdd.org/kdd2001/

[29] Karypis G, Han EH, Kumar V. CHANELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 2(8):68-75.

[30] Estivill-Castro V, Lee I. AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets. In: Abrahart J, Carlisle BH, eds. Proc. of the 5th Int'l Conf. on Geocomputation. 2000. 23-25. http://www.geocomputation.org/ 2000/index.html

[31] Li YJ. A clustering algorithm based on maximal (-distant subtrees. Pattern Recognition, 2007,40(5):1425-1431.

[32] Zhao YC, Song J. GDILC: A grid-based density isoline clustering algorithm. In: Zhong YX, Cui S, Yang Y, eds. Proc. of the Internet Conf. on Info-Net. Beijing: IEEE Press, 2001. 140-145. http://ieeexplore.ieee.org/iel5/7719/21161/00982709.pdf

[33] Ma WM, Chow E, Tommy WS. A new shifting grid clustering algorithm. Pattern Recognition, 2004,37(3):503-514.

[34] Pilevar AH, Sukumar M. GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognition Letters, 2005,26(7):999-1010.

[35] Nanni M, Pedreschi D. Time-Focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems, 2006, 27(3):267-289.

[36] Birant D, Kut A. ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering, 2007,60(1): 208-221.

[37] Tsai CF, Tsai CW, Wu HC, Yang T. ACODF: A novel data clustering approach for data mining in large databases. Journal of Systems and Software, 2004,73(1):133-145.

引用本文

孙吉贵,刘杰,赵连宇.聚类算法研究.软件学报,2008,19(1):48-61

复制

文章指标

点击次数:28795
下载次数: 64216
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2007-04-24
最后修改日期:2007-08-03
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码