基于概率模型的大规模网络结构发现方法
作者:
基金项目:

国家自然科学基金(61473030,61370129);中央高校科研业务经费(2014YJS039);河北省自然科学基金(F2013205192);北京市科委项目(Z131110002813118);北大方正集团有限公司数字出版技术国家重点实验室开放课题;


Approaches of Structure Exploratory Based on Probabilistic Models in Massive Networks
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [34]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着万维网和在线社交网站的发展,规模大、结构复杂、动态性强的大规模网络应用而生.发现这些网络的潜在结构,是分析和理解网络数据的基本途径.概率模型以其灵活的建模和解释能力、坚实的理论框架成为各领域研究网络结构发现任务的有效工具,但该类方法存在计算瓶颈.近几年出现了一些基于概率模型的大规模网络结构发现方法,主要从网络表示、结构假设、参数求解这3个方面解决计算问题.按照模型参数求解策略将已有方法归为两类:随机变分推理(stochastic variational inference)方法和在线EM(online expectation maximazation)方法,详细分析各方法的设计动机、原理和优缺点.定性和定量地对比、分析典型方法的特点和性能,并提出大规模网络结构发现模型的设计原则.最后,概括该领域研究的核心问题,展望未来发展趋势.

    Abstract:

    The growth of the Internet and the emergence of online social websites bring up the development of massive networks which are large in scale, complex in structure, and dynamical in time. Exploring latent structure underlying a network is the fundamental solution to understand and analyze the network. Probabilistic models become effective tools in diverse areas of structure exploratory due to their flexibility in modeling, interpretability and the sound theoretical framework, however they incur computational bottlenecks. Recently, several approaches based on probabilistic models have been developed to explore structure in massive networks, which aim to solve the computational problems from three aspects: representations of a network, assumptions of the structure and methods of parameter estimation. This study classifies existing approaches as two categories by the methods of parameter estimation: approaches based on stochastic variational inference and online EM approaches, and analyzes in detail their designing incentives, principles, pros and cons. The properties and performance of classical models are compared and analyzed qualitatively and quantitatively, and as a result the principles are provided to develop approaches of structure detection in massive networks. Finally, the core problems of structure exploratory in massive networks are summarized based on probabilistic models and the development trend of this area is projected.

    参考文献
    [1] Wang YJ, Wong GY. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 1987,82(397): 8-19. [doi: 10.1080/01621459.1987.10478385]
    [2] Airodi EM, Blei DM, Fienberg SE, Stephen E, Eric XE. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 2008,9(1):1981-2014.
    [3] Ren W, Yan GY, Liao XP, Lan X. Simple probabilistic algorithm for detecting community structure, Physical Review E, 2009, 79(3):036111.
    [4] Shen HW, Cheng XQ, Guo JF. Exploring the structural regularities in networks. Physical Review E, 2011,84(5):056111. [doi: 10.1103/PhysRevE.84.056111]
    [5] Karrer B, Newman MEJ. Stochastic blockmodels and community structure in networks. Physical Review E, 2011,83(11):016107. [doi: 10.1103/PhysRevE.83.016107]
    [6] Ball B, Karrer B, Newman MEJ. An efficient and principled method for detecting communities in network. Physical Review, 2011, 84(3):036103. [doi: 10.1103/PhysRevE.84.036103]
    [7] Hofman JM, Wiggins CH. Bayesian approach to network modularity. Physical Review Letters, 2008,100(25):258701. [doi: 10. 1103/PhysRevLett.100.258701]
    [8] Latouche P, Birmele E, Ambroise C. Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 2011,5(1):309-336. [doi: 10.1214/10-AOAS382]
    [9] Chai BF, Yu J, Jia CY, Yang TB, Jiang YW. Combining a popularity-productivity stochastic block model with a discriminative- content model for general structure detection. Physical Review E, 2013,88(3):012807. [doi: 10.1103/PhysRevE.88.012807]
    [10] Yang B, Liu JM, Liu D. Characterizing and extracting multiplex patterns in complex networks. IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics, 2012,42(2):469-481. [doi: 10.1109/TSMCB.2011.2167751]
    [11] Nowicki K, Snijders T. Estimation and prediction for stochastic block structures. Journal of American Statistical Association, 2001, 96(455):1077-1087. [doi: 10.1198/016214501753208735]
    [12] Daudin J, Picard F, Robin S. A mixture model for random graphs. Journal of Statistical Computation and Simulation, 2008,18(2):179-183. [doi: 10.1007/s11222-007-9046-7]
    [13] Ho Q, Parikh AP, Xing EP. A multiscale community blockmodel for network exploration. Journal of the American Statistical Association, 2012,107(499):916-934. [doi: 10.1080/01621459.2012.682530]
    [14] Gopalan P, Mimno D, Gerrish SM, Freedman MJ, Blei DM. Scalable inference of overlapping communities. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. of the 26th Annual Conf. on Neural Information Processing Systems 2012. San Francisco: Inc. Curran Associates, 2012. 2258-2266.
    [15] Gopalan PK, Blei DM. Efficient discovery of overlapping communities in massive networks. Proc. of the National Academy of Sciences, 2013,110(36):14534-14539. [doi: 10.1073/pnas.1221839110]
    [16] Gopalan PK, Wang C, Blei DM. Modeling overlapping communities with node popularities. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 2850-2858.
    [17] Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. Journal of Machine Learning Research, 2013,14: 1307-1347.
    [18] Kim DI, Gopalan P, Blei DM, Sudderth EB. Efficient online inference for bayesian nonparametric relational models. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 962-970.
    [19] Zanghi H, Ambroise C, Miele V. Fast online graph clustering via Erdös-Rényi mixture. Pattern Recognition, 2008,41(12): 3592-3599. [doi: 10.1016/j.patcog.2008.06.019]
    [20] Zanghi H, Picard F, Miele V, Ameroise C. Strategies for online inference of model-based clustering in large and growing networks. The Annals of Applied Statistics, 2010,4(2):687-714. [doi: 10.1214/10-AOAS359]
    [21] Ho QR, Yin JM, Xing EP. On triangular versus edge representations—Towards scalable modeling of networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. of the 26th Annual Conf. on Neural Information Processing Systems 2012. San Francisco: Inc. Curran Associates, 2012. 2141-2149.
    [22] Yin JM, Ho QR, Xing EP. A scalable approach to probabilistic latent space inference of large-scale networks. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 422-430.
    [23] Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008,1(1-2):1-305.
    [24] Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Machine Learning, 1999, 37:183-233. [doi: 10.1023/A:1007665907178]
    [25] Amari S. Natural gradient works efficiently in learning. Neural Computation, 1998,10(2):251-276. [doi: 10.1162/ 0899766983000 17746]
    [26] Lai TL. Stochastic approximation. The Annals of Statistics, 2003,31(2):391-406. [doi: 10.1214/aos/1051027873]
    [27] Hoffman DH, Blei DM, Bach FR. Online learning for latent dirichlet allocation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Proc. of the 24 th Annual Conf. on Neural Information Processing Systems 2010. San Francisco: Curran Associates Group, Inc., 2010. 856-864.
    [28] Neal RM, Hinton GE. A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI, ed. Proc. of the Learning in Graphical Models. Cambridge: MIT Press, 1999. 335-368. [doi: 10.1007/978-94-011-5014-9_12]
    [29] Bottou L. Online algorithms and stochastic approximations. In: Saad D, ed. Proc. of the Online Learning and Neural Networks. Cambridge: Cambridge University Press, 1998.
    [30] Hoi S CH, Wang JL, Zhao PL. LIBOL: A library for online learning algorithms. Journal of Machine Learning Research, 2014,15: 495-499.
    [31] Cappé O, Moulines E. Online EM algorithm for latent data models. Journal of the Royal Statistical Society B, 2009,71(3):593-613.
    [32] Cappé O. Online Expectation-Maximisation. Mixtures: Estimation and Applications, 2011. 31-53.
    [33] Papadopoulos F, Kitsak M, Serrano M, Boguna M, Krioukov D. Popularity versus similarity in growing networks. Nature, 2012,489(7417):537-540. [doi: 10.1038/nature11459]
    [34] Nepusz T, Petrczi A, Ngyessy L, Bazso F. Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E, 2008,77(1):016107. [doi: 10.1103/PhysRevE.77.016107]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

柴变芳,贾彩燕,于剑.基于概率模型的大规模网络结构发现方法.软件学报,2014,25(12):2753-2766

复制
分享
文章指标
  • 点击次数:6070
  • 下载次数: 9040
  • HTML阅读次数: 2449
  • 引用次数: 0
历史
  • 收稿日期:2014-04-14
  • 最后修改日期:2014-08-21
  • 在线发布日期: 2014-12-04
文章二维码
您是第19830950位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号