微博网络上的重叠社群发现与全局表示
作者:
基金项目:

国家自然科学基金(61403156,61375069,61105069);国家博士后基金(2011M500846);江苏省自然科学基金(11KJB520001,13KJB520002);江苏省科技支撑计划(BE2012181)


Overlapping Community Discovery and Global Representation on MicroBlog Network
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [27]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    微博网络是新兴的覆盖海量用户、涉及广泛话题并具有复杂重叠社群结构的多模网络.在深入研究微博网络各类实体和属性内在联系的基础上,提出了以用户-话题关系为主要划分原则的重叠社群表达模型及相应的社群结构发现算法.该方法不仅考虑网络中的用户-话题关系,还融合了这一网络特有的用户关注关系、博文评论与转发关系等所形成的复合网络关系.同时,改进了传统的社群隶属矩阵表述模型,通过引入虚拟社群,使隶属矩阵不仅合理反映个体对社群的隶属度,同时标识了个体在社群中的核心度.通过基于新浪微博数据集的实验验证,结果表明:该模型与方法能够高效合理地刻画该数据集包含的重叠社群结构,实验结果具有良好的可解释性,所提出的模型和算法可以有效地应用于类似多模网络社群划分和演化分析研究中.

    Abstract:

    Micro-Blog cyberspace is a booming multiple mode network of numerous overlapping communities covering huge amount of users and topics relating to the nature, the society and the everyday life. Based on in depth analysis on the entities and inherent relationships among the network, this paper purposes a user-topic relation dominated structural module for overlapping community representation and detection, and also infuses the follow relationship along with the blog-forward and blog-comment relationship into the module. By introducing a virtual community into the actual communities of the network, the paper also puts forward an improved global belongingness matrix as user's role representation which has the ability to properly describe a user's degree of participation and importance in the network. Experimental results on Sina's micro-blog dataset show that the new method is favorable and efficient for finding meaningful communities from the micro-blog. Furthermore, the proposed module and algorithms can be adapted in various ways for similar social network analysis and helpful for community evolution research.

    参考文献
    [1] Java A, Song XD, Finin T, Tseng B. Why we Twitter: An analysis of a microblogging community. In: Zhang H, et al., eds. Proc. of the WebKDD/ SNA-KDD. LNCS 5439, Berlin, Heidelberg: Springer-Verlag, 2009. 118-138. [doi: 10.1007/978-3-642-00528-2_7]
    [2] Kivran-Swaine F, Govindan P, Naaman M. The impact of network structure on breaking ties in online social networks: Unfollowing on Twitter. In: Desney ST, ed. Proc. of the Annual Conf. on Human Factors in Computing Systems. New York: ACM Press, 2011. 1101-1104. [doi: 10.1145/1978942.1979105]
    [3] Zhang Y, Wu Y, Yang Q. Community discovery in Twitter based on user interests. Journal of Computational Information Systems, 2012,8(3):991-1000.
    [4] Tang L, Liu H, Zhang JP. Identifying evolving groups in dynamic multimode networks. IEEE Trans. on Knowledge and Data Engineering, 2012,24(1):72-85. [doi: 10.1109/TKDE.2011.159]
    [5] Yu LB, Ding C. Network community discovery: Solving modularity clustering via normalized cut. In: Brefeld U, ed. Proc. of the 8th Workshop on Mining and Learning with Graphs. New York: ACM Press, 2010. 34-36. [doi: 10.1145/1830252.1830257]
    [6] Huberman BA, Romero DM, Wu F. Social networks that matter: Twitter under the microscope. ArXiv e-prints. http://arxiv.org/abs/ 0812.1045. [doi: 10.2139/ssrn.1313405]
    [7] Gao Q, Qu Q, Zhang XH. Mining social relationships in micro-blogging systems. In: Ant OA, Panayiotis Z, eds. Book: Online Communities and Social Computing. Berlin, Heidelberg: Springer-Verlag, 2011. 110-119. [doi: 10.1007/978-3-642-21796-8_12]
    [8] Palla G, Derienyi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005,435(7043):814-818. [doi: 10.1038/nature03607]
    [9] Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008,10:10008. [doi: 10.1088/1742-5468/2008/10/p10008]
    [10] Gregory S. Finding overlapping communities in networks by label propagation. New Journal of Physics, 2010,12(10):103018. [doi: 10.1088/1367-2630/12/10/103018]
    [11] Wang XF, Tang L, Gao HJ, Liu H. Discovering overlapping groups in social media. In: Geoffrey I, ed. Proc. of the 10th IEEE Int'l Conf. on Data Mining. IEEE Computer Society, 2010. 569-578. [doi: 10.1109/ICDM.2010.48]
    [12] Lancichinetti A, Radicchi F, Ramasco J. Finding statistically significant communities in networks. PLoS One, 2011,6(4):e18961. [doi: 10.1371/journal.pone.0018961]
    [13] Gruzd A, Wellman B, Takhteyev YJ, Fortunate S. Imagining Twitter as an imagined community. American Behavioral Scientist, 2011,55(10): 1294-1318. [doi: 10.1177/0002764211409378]
    [14] Hazlewood WR, Makice K, Ryan W. Twitterspace: A co-developed display using Twitter to enhance community awareness. In: Simonsen J, ed. Proc. of the Participatory Design Conf. The Trustees of Indiana University, 2008. 230-234. [doi: 10.1145/1795234. 1795284]
    [15] Meeder B, Karrer B, Sayedi A, Ravi R, Borgs C, Chayes J. We know who you followed last summer: Inferring social link creation times in Twitter. In: Sadagopan S, ed. Proc. of the 20th Int'l Conf. on World Wide Web. New York: ACM Press, 2011. 517-526. [doi: 10.1145/1963405. 1963479]
    [16] Lin C, Lin C, Li JX, Wang DD, Chen Y, Li T. Generating event storylines from microblogs. In: Chen XW, ed. Proc. of the 21st ACM inter. Conf. on Information and knowledge management. Maui. ACM Press, 2012. 175-184. [doi: 10.1145/2396761. 2396788]
    [17] Lin C, Lin C, Lin ZY, Quan Z. Hybrid pseudo relevance feedback for microblog retrieval. Journal of Information Science, 2013, 39(6):773-788.
    [18] Yuan Y, Yang CM. Empirical analysis of all kinds of social networks and their relationships formed by information communication among microblog users. Library and Information Service, 2011,55(12):11-25 (in Chinese with English abstract).
    [19] Teutle ARM. Twitter: Network properties analysis. In: Palomares RA, ed. Proc. of the Int'l Conf. on 20th Electronics, Communications and Computer. Cholula: IEEE, 2010. 180-186. [doi: 10.1109/CONIELECOMP.2010.5440773]
    [20] Gupta M, Gao J, Sun YZ, Han JW. Integrating community matching and outlier detection for mining evolutionary community outliers. In: Yang Q, ed. Proc. of the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. New York: ACM Press, 2012. 859-867. [doi: 10.1145/2339530.2339667]
    [21] Jakobsson M, Rosenberg NA. CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 2007,23(14):1801-1806. [doi: 10.1093/bioinformatics/btm233]
    [22] Salton G, Buckley C. Term-Weighting approaches in automatic text retrieval. Information Processing & Management, 1988,24(5): 513-523. [doi: 10.1016/0306-4573(88)90021-0]
    [23] Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research, 2003,3:993-1022. [doi: 10.1162/ jmlr.2003.3.4-5.993]
    [24] Adar E, Teevan J, Dumais ST. Large scale analysis of Web revisitation patterns. In: Czerwinski M, ed. Proc. of the ACM Conf. on Human Factors in Computing Systems (CHI 2008). Florence: ACM Press, 2008. 1197-1206. [doi: 10.1145/1357054.1357241]
    [25] Pavan M, Pelillo M. Dominant sets and hierarchical clustering. In: Proc. of the 9th IEEE Int'l Conf. on Computer Vision. Nice: IEEE, 2003. 362-369. [doi: 10.1109/ICCV.2003.1238367]
    [26] Pavan M, Pelillo M. Dominant sets and pairwise clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007,29(1): 167-172. [doi: 10.1109/TPAMI.2007.250608]
    [27] http://www.datatang.com/data/45081
    相似文献
    引证文献
引用本文

胡云,王崇骏,吴骏,谢俊元,李慧.微博网络上的重叠社群发现与全局表示.软件学报,2014,25(12):2824-2836

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2014-04-10
  • 最后修改日期:2014-08-21
  • 在线发布日期: 2014-12-04
文章二维码
您是第19893458位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号