Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion
Author:
Affiliation:

Fund Project:

National Science Foundation for Distinguished Young Scholars of China (61225012, 71325002); National Natural Science Foundation of China (61572123, 61300195); Specialized Research Fund of the Doctoral Program of Higher Education (20120042 130003); Liaoning BaiQianWan Talents Program (2013921068); Natural Science Foundation of Hebei Province (F2014501078); Technology Planning Project of Hebei Province (15210146)

  • Article
  • | |
  • Metrics
  • |
  • Reference [47]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    With the dramatic increase of microblog users, microblog websites have become the platform for a wide spectrum of users to get information. Due to the fact that blog is a special type of text with restricted length, traditional community detection algorithms cannot effectively solve the sparse problem of micro blog. To address the issue, the DC-DTM (discovery community by dynamic topic model) algorithm is proposed in this paper. First, the algorithm maps microblog as a directed-weighted network, in which the direction is the concerned relationship, and the weight is the topic's similarity of different nodes calculated by DTM (dynamic topic model). DTM is a microblog topic model which can not only mine the topics of each microblog accurately but also calculate author's influence a topic. Second, the algorithm uses label propagation WLPA (weighted lebel propagation), with low complexity, to find communities in microblog. The initial process selects nodes with the largest influence as the initial nodes, and propagates the label in the order of node's influences, from large to small. The algorithm overcomes the adverse phenomenon in the traditional label propagation algorithm, and has better stability. Experiments on real data show that the DTM model can be very good for the topic mining in microblog and DC-DTM algorithm can effectively discover the communities of microblog.

    Reference
    [1] Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003,3:993-1022.
    [2] Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T. Probabilistic author-topic models for information discovery. In:Proc. of the 10th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM Press, 2004. 306-315.[doi:10.1145/1014052.1014087]
    [3] Salton G, McGill M. Introduction to Modern Information Retrieval. 3rd ed., New York:ACM, 1999.[doi:10.3724/SP.J.1001.2009. 00054]
    [4] Yang B, Liu DY, Liu J, Jin D, Ma HB. Complex network clustering algorithms. Ruan Jian Xue Bao/Journal of Software, 2009, 20(1):54-66(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3464.htm[doi:10.3724/SP.J.1001.2009.00054]
    [5] Lin YF, Wang TY, Tang R, Zhou YW, Huang HK. An effective model and algorithm for community detection in social networks. Journal of Computer Research and Development, 2012,49(2):337-345(in Chinese with English abstract).
    [6] Yan B, Gregory S. Detecting community structure in networks using edge prediction methods. Journal of Statistical Mechanics:Theory and Experiment, 2012,2012(9):No.P09008.[doi:10.1088/1742-5468/2012/09/P09008]
    [7] Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005,435(7043):814-818.[doi:10.1038/nature03607]
    [8] Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. Plos One, 2011, 6(4):336-338.[doi:10.1371/journal.pone.0018961]
    [9] Baumes J, Goldberg M, Magdon-Ismail M. Efficient identification of overlapping communities. Lecture Notes in Computer Science, 2005,3495:27-36.[doi:10.1007/11427995_3]
    [10] Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 2007,76(3):No.036106.[doi:10.1103/PhysRevE.76.036106]
    [11] Gregory S. Finding overlapping communities in networks by label propagation. New Journal of Physics, 2010,12(10):No.103018.[doi:10.1088/1367-2630/12/10/103018]
    [12] Liu SC, Zhu FX, Gan L. A label-propagation-Probability-Based algorithm for overlapping community detection. Chinese Journal of Computers, 2016,39(4):717-729(in Chinese with English abstract).[doi:10.11897/SP.J.1016.2016.00717]
    [13] Blei D, Ng A, Jourdan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003(3):993-1022.
    [14] Minka T, Lafferty J. Expectation-Propagation for the generative aspect model. In:Proc. of the 18th Conf. on Uncertainty in Artificial Intelligence. San Francisco:Morgan Kaufmann Publishers, Inc., 2002. 352-359.
    [15] Steyvers M, Griffiths T. Probablistic topic model. In:Landauer T, McNamara D, Dennis S, Kintsch W, eds. Latent Semantic Analysis:A Road to Meaning. Springer-Verlag, 2007.
    [16] Yao QZ, Song ZL, Peng C. Research on text categorization based on LDA. Computer Engineering and Applications, 2011,47(13):150-153.
    [17] Duan L, Zhu XY. Microblog community detection method based on community spatio-temporal topic model. Journal of University of Electronic Science and Technology of China, 2014,43(3):465-468(in Chinese with English abstract).
    [18] Yang J, Xin Y, Xie ZQ. Semantics social network community detection algorithm based on topic comprehensive factor analysis. Journal of Computer Research and Development, 2014,51(3):559-569(in Chinese with English abstract).
    [19] Xin Y, Yang J, Xie ZQ. An overlapping semantic community structure detecting algorithm by label propagation. Acta Automatica Sinica, 2014,40(10):2262-2275(in Chinese with English abstract).
    [20] Xin Y, Yang J, Xie ZQ. An overlapping community structure detectiong algorithm in semantic social network based on block field. Acta Automatica Sinica, 2015,41(2):362-375(in Chinese with English abstract).
    [21] Xin Y, Yang J, Xie ZQ. A semantic overlapping community detecting algorithm in social networks based on random walk. Journal of Computer Research and Develapment. 2015,52(2):499-511(in Chinese with English abstract).
    [22] Xin Y, Yang J, Xie ZQ. Link-Block method for the semantic overlapping community detection. Ruan Jian Xue Bao/Journal of Software, 2016,27(2):363-380(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4810.htm[doi:10.13328/j.cnki. jos.004810]
    [23] Zhou XP, Liang X, Zhang HY. User community detection on micro-blog using R-C model. Ruan Jian Xue Bao/Journal of Software, 2014,25(12):2808-2823(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4720.htm[doi:10.13328/j.cnki.jos.004720]
    [24] Zhang CY, Sun JL, Ding YQ. Topic mining for microblog based on MB-LDA model. Journal of Computer Research and Development, 2011,48(10):1795-1802(in Chinese with English abstract).
    [25] Zhang ZF, Li QD, Zeng D, Gao H. User community discovery from multi-relational networks. Decision Support Systems, 2013, 54(2):870-879.[doi:10.1016/j.dss.2012.09.012]
    [26] Zhang LM, Huang WJ, Chen W, Wang TJ, Lei K. EMTM:A method for experts mining in micro-blog with topic-level. Journal of Computer Research and Development, 2015,52(11):2517-2526(in Chinese with English abstract).
    [27] Hu Y, Wang CJ, Wu J, Xie JY, Li H. Overlapping community discovery and global representation on MicroBlog network. Ruan Jian Xue Bao/Journal of Software, 2014,25(12):2824-2836(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4721.htm[doi:10.13328/j.cnki.jos.004721]
    [28] Chai BF, Jia CY, Yu J. Approaches of structure exploratory based on probabilistic models in massive networks. Ruan Jian Xue Bao/Journal of Software, 2014,25(12):2753-2766(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4722.htm[doi:10.13328/j.cnki.jos.004722]
    [29] Liu SP, Yin J, Ouyang J, Huang Y, Yang XY. Topic mining from microblogs based on MB-HDP model. Chinese Journal of Computers, 2015,38(7):1408-1419(in Chinese with English abstract).
    [30] Leibler RA, Kullback S. On information and sufficiency. Annals of Mathematical Statistics, 1951,22(1):79-86.[doi:10.1214/aoms/1177729694]
    [31] Shen HW, Cheng XQ, Cai K, Hu MB. Detect overlapping and hierarchical community structure in networks. Physica A:Statistical Mechanics and its Applications, 2009,388(8):1706-1712.[doi:10.1016/j.physa.2008.12.021]
    [32] Ahn YY, Bagrow JP, Lehmann S. Link communities reveal multiscale complexity in networks. Nature, 2010,466(7307):761-764.[doi:10.1038/nature09182]
    [4] 杨博,刘大有,金弟,马海宾.复杂网络聚类方法.软件学报,2009,20(1):54-66 http://www.jos.org.cn/1000-9825/3464.htm[doi:10. 3724/SP.J.1001.2009.03464]
    [5] 林有芳,王天宇,唐锐,周元炜,黄厚宽.一种有效的社会网络社区发现模型和算法.计算机研究与发展,2012,49(2):337-345.
    [12] 刘世超,朱福喜,甘琳.基于标签传播概率的重叠社区发现算法.计算机学报,2015,39(4):717-729.[doi:10.11897/SP.J.1016.2016. 00717]
    [17] 段炼,朱欣焰.基于社区时空主题模型的微博社区发现方法.电子科技大学学报,2014,43(3):464-469.
    [18] 杨静,辛宇,谢志强.基于话题综合因子分析的语义社会网络社区发现算法.计算机研究与发展,2014,51(3):559-569.
    [19] 辛宇,杨静,谢志强.基于标签传播的语义重叠社区发现算法.自动化学报,2014,40(10):2262-2275.
    [20] 辛宇,杨静,谢志强.一种面向语义重叠社区发现的Block场取样算法.自动化学报,2015,41(2):362-375.
    [21] 辛宇,杨静,谢志强.基于随机游走的语义重叠社区发现算法.计算机研究与发展,2015,52(2):499-511.
    [22] 辛宇,杨静,谢志强.一种面向语义重叠社区发现的Link-Block算法.软件学报,2016,52(2):363-380. http://www.jos.org.cn/1000-9825/4810.htm[doi:10.13328/j.cnki.jos.004810]
    [23] 周小平,梁循,张海燕.基于R-C模型的微博用户社区发现.软件学报,2014,25(12):2808-2823. http://www.jos.org.cn/1000-9825/4720.htm[doi:10.13328/j.cnki.jos.004720]
    [24] 张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘.计算机研究与发展,2011,48(10):1795-1802.
    [26] 张腊梅,黄威靖,陈薇,王腾蛟,雷凯.EMTM:微博中与主题相关的专家挖掘方法.计算机研究与发展,2015,52(11):2517-2526.
    [27] 胡云,王崇骏,吴骏,谢俊元,李慧.微博网络上的重叠社群发现与全局表示.软件学报,2014,25(12):2824-2836. http://www.jos.org.cn/1000-9825/4721.htm[doi:10.13328/j.cnki.jos.004721]
    [28] 柴变芳,贾彩燕,于剑.基于概率模型的大规模网络结构发现方法.软件学报,2014,25(12):2753-2766. http://www.jos.org.cn/1000-9825/4722.htm[doi:10.13328/j.cnki.jos.004722]
    [29] 刘少鹏,印鉴,欧阳佳,黄云,杨晓颖.基于MB-HDP模型的微博主题挖掘.计算机学报,2015,38(7):1408-1419.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘冰玉,王翠荣,王聪,王军伟,王兴伟,黄敏.基于动态主题模型融合多维数据的微博社区发现算法.软件学报,2017,28(2):246-261

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 26,2015
  • Revised:March 17,2016
  • Online: January 24,2017
You are the first2034790Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063