基于隐含变量的聚类集成模型
作者:
基金项目:

Supported by the China Scholarship Council Foundation under Grant No.2007U24068 (国家留学基金委员会资助项目)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [18]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    聚类集成能成为机器学习活跃的研究热点,是因为聚类集成能够保护私有信息、分布式处理数据和对知识进行重用,此外,噪声和孤立点对结果的影响较小.主要工作包括:第一,分析了把每一个基聚类器看成是原数据的一个属性这种处理方式的优越性,发现按此方法建立起来的聚类集成算法就具有良好的扩展性和灵活性;第二,在此基础之上,建立了latent variable cluster ensemble(LVCE)概率模型进行聚类集成,并且给出了LVCE 模型的Markovchain Monte Carlo(MCMC)算法.实验结果表明,LVCE 模型的MCMC 算法能够进行聚类集成并且达到良好的效果,同时可以体现数据聚类的紧密程度.

    Abstract:

    Cluster ensemble becomes a research focus due to its success in privacy protection, distributing computing and reusing knowledge. Furthermore, the noise and isolation have little effect on the final result. Thereare two contributions in this paper. First, by regarding every base clustering as one attribute of the original data, it has found that the algorithm based on that is more extendable and flexible. Second, it designs a latent variable cluster ensemble (LVCE) model in this way and infers the algorithm of the model with Markov chain Monte Carlo (MCMC) approximation. At the end of the paper, the experimental results show that the MCMC algorithm of LVCE has a better result and can show the compactedness of data points clustering.

    参考文献
    [1] Tang W, Zhou ZH. Bagging-Based selective clusterer ensemble. Journal of Software, 2005,16(4):496?502 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/16/496.htm
    [2] Oliveira SRM, Zaane OR. Privacy preserving clustering by data transformation. In: Proc. of the 18th Brazilian Symp. on Databases.Manaus, 2003. 304?318. http://citeseer.ist.psu.edu/article/oliveira03privacy.html
    [3] Strehl A, Ghosh J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 2002,3:583?617. http://jmlr.csail.mit.edu/papers/volume3/strehl02a/strehl02a.pdf
    [4] Nguyen N, Caruana R. Consensus clusterings. In: Proc. of the 7th IEEE Int’l Conf. on Data Mining. Omaha, 2007. http://www.ist.unomaha.edu/icdm2007/papers/papers.php
    [5] Windeatt T. Vote counting measures for ensemble classifiers. Pattern Recognition, 2003,12(36):2743?2756.
    [6] Zhou ZH, Tang W. Clusterer ensemble. Knowledge-Based Systems, 2006,19(1):77?83.
    [7] Asur S, Parthasarathy S, Ucar D. An ensemble approach for clustering scale-free graphs. In: Proc. of the 20th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Philadelphia, 2006. http://kt.ijs.si/Dunja/LinkKDD2006/Papers/asur.pdf
    [8] Kuncheva LI, Hadjitodorov ST. Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. of the 21st Int’l Conf.on Machine Learning. Banff, 2004. 281?288. http://portal.acm.org/citation.cfm?id=1015414
    [9] Li T, Ding C, Jordan MI. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In:Proc. of the 7th IEEE Int’l Conf. on Data Minning. Omaha, 2007. http://www.cs.berkeley.edu/~jordan/papers/li-ding-jordanicdm07.pdf
    [10] Topchy A, Jain AK, Punch W. A mixture model for clustering ensembles. In: Proc. of the 4th SIAM Int’l Conf. on Data Mining.Lake Buena Vista, 2004. 22?24. http://www.siam.org/proceedings/datamining/2004/dm04_035topchya.pdf
    [11] Al-Razgan M, Domeniconi C. Weighted cluster ensemble. In: Proc. of the Society for Industry and Applied Mathematics Conf. onData Mining. 2006. 258?269. http://www.siam.org/meetings/sdm06/proceedings/024alrazganm.pdf
    [12] Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF. Adaptive clustering ensembles. In: Proc. of the 17th Int’l Conf. on PatternRecognition (ICPR 2004), Vol.1. 2004. 272?275. http://www.lon-capa.org/papers/adaptive.pdf
    [13] Zhou ZH, Zhang ML. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge andInformation Systems, 2007,11(2):155?170.
    [14] Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003,3:993?1022.
    [15] Robert CP, Casella G. Monte Carlo Statistical Methods. 2nd ed., New York: Springer-Verlag, 2004.
    [16] Casella G, George EI. Explaining the Gibbs sampler. The American Statistician, 1992,46:167?174.
    [17] Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction. Science, 2000,290:2319?2323.
    [18] Kuncheva LI, Hadjitodorov ST. Using diversity in cluster ensembles. In: Proc. of the IEEE Int’l Conf. on Systems, Man and Cybernetics. 2004. 1214?1219. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1399790
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王红军,李志蜀,成飏,周鹏,周维.基于隐含变量的聚类集成模型.软件学报,2009,20(4):825-833

复制
分享
文章指标
  • 点击次数:5101
  • 下载次数: 7481
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2008-03-13
  • 最后修改日期:2008-08-11
文章二维码
您是第19780903位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号