基于动态主题模型融合多维数据的微博社区发现算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家杰出青年科学基金(61225012,71325002);国家自然科学基金(61572123,61300195);高等学校博士学科点专项科研基金(20120042130003);辽宁省百千万人才工程项目(2013921068);河北省自然科学基金(F2014501078);河北省科技计划(15210146)


Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion
Author:
Affiliation:

Fund Project:

National Science Foundation for Distinguished Young Scholars of China (61225012, 71325002); National Natural Science Foundation of China (61572123, 61300195); Specialized Research Fund of the Doctoral Program of Higher Education (20120042 130003); Liaoning BaiQianWan Talents Program (2013921068); Natural Science Foundation of Hebei Province (F2014501078); Technology Planning Project of Hebei Province (15210146)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着微博用户的不断增加,微博网络已成为用户进行信息交流的平台.针对由于博文长度受限,传统的社区发现算法无法有效解决微博网络的稀疏性等问题,提出了DC-DTM(discovery community by dynamic topic model)算法.DC-DTM算法首先将微博网络映射为有向加权网络,网络中边的方向反映节点之间的关注关系,利用所提出的DTM(dynamic topic model)计算出节点之间的语义相似度,并将其作为节点间连边的权重.DTM是一种微博主题模型.该模型不仅能够挖掘博客的主题分布,而且能够计算出某一主题中用户的影响力大小.其次,利用所提出的复杂度较低的标签传播算法WLPA(weighted lebel propagation)进行微博网络的社区发现.该算法的初始化阶段将影响力大的用户节点作为初始节点,标签按照节点的影响力从大到小进行传播,避免了传统标签传播算法逆流现象的发生,提高了标签传播算法的稳定性.真实数据上的实验结果表明,DTM模型能够很好地对微博进行主题挖掘,DC-DTM算法能够有效地挖掘出微博网络的社区.

    Abstract:

    With the dramatic increase of microblog users, microblog websites have become the platform for a wide spectrum of users to get information. Due to the fact that blog is a special type of text with restricted length, traditional community detection algorithms cannot effectively solve the sparse problem of micro blog. To address the issue, the DC-DTM (discovery community by dynamic topic model) algorithm is proposed in this paper. First, the algorithm maps microblog as a directed-weighted network, in which the direction is the concerned relationship, and the weight is the topic's similarity of different nodes calculated by DTM (dynamic topic model). DTM is a microblog topic model which can not only mine the topics of each microblog accurately but also calculate author's influence a topic. Second, the algorithm uses label propagation WLPA (weighted lebel propagation), with low complexity, to find communities in microblog. The initial process selects nodes with the largest influence as the initial nodes, and propagates the label in the order of node's influences, from large to small. The algorithm overcomes the adverse phenomenon in the traditional label propagation algorithm, and has better stability. Experiments on real data show that the DTM model can be very good for the topic mining in microblog and DC-DTM algorithm can effectively discover the communities of microblog.

    参考文献
    相似文献
    引证文献
引用本文

刘冰玉,王翠荣,王聪,王军伟,王兴伟,黄敏.基于动态主题模型融合多维数据的微博社区发现算法.软件学报,2017,28(2):246-261

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-12-26
  • 最后修改日期:2016-03-17
  • 录用日期:
  • 在线发布日期: 2017-01-24
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号