基于概率模型的大规模网络结构发现方法

doi:10.13328/j.cnki.jos.004722

微信服务号

微信订阅号

2025年4月17日 0:24 星期四

首页 > 过刊浏览>2014年第25卷第12期 >2753-2766. DOI:10.13328/j.cnki.jos.004722

PDF HTML阅读 XML下载导出引用引用提醒

基于概率模型的大规模网络结构发现方法
DOI:
                        10.13328/j.cnki.jos.004722
                    
CSTR:
                        
                    
作者:
                        柴变芳柴变芳
交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044;石家庄经济学院 信息工程学院, 河北 石家庄 050031
在期刊界中查找
在百度中查找
在本站中查找
贾彩燕贾彩燕
交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044
在期刊界中查找
在百度中查找
在本站中查找
于剑于剑
交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61473030,61370129);中央高校科研业务经费(2014YJS039);河北省自然科学基金(F2013205192);北京市科委项目(Z131110002813118);北大方正集团有限公司数字出版技术国家重点实验室开放课题;

Approaches of Structure Exploratory Based on Probabilistic Models in Massive Networks

Author:

CHAI Bian-Fang
CHAI Bian-Fang
Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China;Department of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang 050031, China
在期刊界中查找
在百度中查找
在本站中查找
JIA Cai-Yan
JIA Cai-Yan
Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China
在期刊界中查找
在百度中查找
在本站中查找
YU Jian
YU Jian
Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [34]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着万维网和在线社交网站的发展,规模大、结构复杂、动态性强的大规模网络应用而生.发现这些网络的潜在结构,是分析和理解网络数据的基本途径.概率模型以其灵活的建模和解释能力、坚实的理论框架成为各领域研究网络结构发现任务的有效工具,但该类方法存在计算瓶颈.近几年出现了一些基于概率模型的大规模网络结构发现方法,主要从网络表示、结构假设、参数求解这3个方面解决计算问题.按照模型参数求解策略将已有方法归为两类:随机变分推理(stochastic variational inference)方法和在线EM(online expectation maximazation)方法,详细分析各方法的设计动机、原理和优缺点.定性和定量地对比、分析典型方法的特点和性能,并提出大规模网络结构发现模型的设计原则.最后,概括该领域研究的核心问题,展望未来发展趋势.

关键词:大规模网络;结构发现;随机变分推理;在线EM算法;三角形模体

Abstract:

The growth of the Internet and the emergence of online social websites bring up the development of massive networks which are large in scale, complex in structure, and dynamical in time. Exploring latent structure underlying a network is the fundamental solution to understand and analyze the network. Probabilistic models become effective tools in diverse areas of structure exploratory due to their flexibility in modeling, interpretability and the sound theoretical framework, however they incur computational bottlenecks. Recently, several approaches based on probabilistic models have been developed to explore structure in massive networks, which aim to solve the computational problems from three aspects: representations of a network, assumptions of the structure and methods of parameter estimation. This study classifies existing approaches as two categories by the methods of parameter estimation: approaches based on stochastic variational inference and online EM approaches, and analyzes in detail their designing incentives, principles, pros and cons. The properties and performance of classical models are compared and analyzed qualitatively and quantitatively, and as a result the principles are provided to develop approaches of structure detection in massive networks. Finally, the core problems of structure exploratory in massive networks are summarized based on probabilistic models and the development trend of this area is projected.

Key words:massive network;structure detection;stochastic varational inference;online EM algorithm;triangular motif

参考文献

[1] Wang YJ, Wong GY. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 1987,82(397): 8-19. [doi: 10.1080/01621459.1987.10478385]

[2] Airodi EM, Blei DM, Fienberg SE, Stephen E, Eric XE. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 2008,9(1):1981-2014.

[3] Ren W, Yan GY, Liao XP, Lan X. Simple probabilistic algorithm for detecting community structure, Physical Review E, 2009, 79(3):036111.

[4] Shen HW, Cheng XQ, Guo JF. Exploring the structural regularities in networks. Physical Review E, 2011,84(5):056111. [doi: 10.1103/PhysRevE.84.056111]

[5] Karrer B, Newman MEJ. Stochastic blockmodels and community structure in networks. Physical Review E, 2011,83(11):016107. [doi: 10.1103/PhysRevE.83.016107]

[6] Ball B, Karrer B, Newman MEJ. An efficient and principled method for detecting communities in network. Physical Review, 2011, 84(3):036103. [doi: 10.1103/PhysRevE.84.036103]

[7] Hofman JM, Wiggins CH. Bayesian approach to network modularity. Physical Review Letters, 2008,100(25):258701. [doi: 10. 1103/PhysRevLett.100.258701]

[8] Latouche P, Birmele E, Ambroise C. Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 2011,5(1):309-336. [doi: 10.1214/10-AOAS382]

[9] Chai BF, Yu J, Jia CY, Yang TB, Jiang YW. Combining a popularity-productivity stochastic block model with a discriminative- content model for general structure detection. Physical Review E, 2013,88(3):012807. [doi: 10.1103/PhysRevE.88.012807]

[10] Yang B, Liu JM, Liu D. Characterizing and extracting multiplex patterns in complex networks. IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics, 2012,42(2):469-481. [doi: 10.1109/TSMCB.2011.2167751]

[11] Nowicki K, Snijders T. Estimation and prediction for stochastic block structures. Journal of American Statistical Association, 2001, 96(455):1077-1087. [doi: 10.1198/016214501753208735]

[12] Daudin J, Picard F, Robin S. A mixture model for random graphs. Journal of Statistical Computation and Simulation, 2008,18(2):179-183. [doi: 10.1007/s11222-007-9046-7]

[13] Ho Q, Parikh AP, Xing EP. A multiscale community blockmodel for network exploration. Journal of the American Statistical Association, 2012,107(499):916-934. [doi: 10.1080/01621459.2012.682530]

[14] Gopalan P, Mimno D, Gerrish SM, Freedman MJ, Blei DM. Scalable inference of overlapping communities. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. of the 26th Annual Conf. on Neural Information Processing Systems 2012. San Francisco: Inc. Curran Associates, 2012. 2258-2266.

[15] Gopalan PK, Blei DM. Efficient discovery of overlapping communities in massive networks. Proc. of the National Academy of Sciences, 2013,110(36):14534-14539. [doi: 10.1073/pnas.1221839110]

[16] Gopalan PK, Wang C, Blei DM. Modeling overlapping communities with node popularities. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 2850-2858.

[17] Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. Journal of Machine Learning Research, 2013,14: 1307-1347.

[18] Kim DI, Gopalan P, Blei DM, Sudderth EB. Efficient online inference for bayesian nonparametric relational models. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 962-970.

[19] Zanghi H, Ambroise C, Miele V. Fast online graph clustering via Erdös-Rényi mixture. Pattern Recognition, 2008,41(12): 3592-3599. [doi: 10.1016/j.patcog.2008.06.019]

[20] Zanghi H, Picard F, Miele V, Ameroise C. Strategies for online inference of model-based clustering in large and growing networks. The Annals of Applied Statistics, 2010,4(2):687-714. [doi: 10.1214/10-AOAS359]

[21] Ho QR, Yin JM, Xing EP. On triangular versus edge representations—Towards scalable modeling of networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. of the 26th Annual Conf. on Neural Information Processing Systems 2012. San Francisco: Inc. Curran Associates, 2012. 2141-2149.

[22] Yin JM, Ho QR, Xing EP. A scalable approach to probabilistic latent space inference of large-scale networks. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Proc. of the 27th Annual Conf. on Neural Information Processing Systems 2013. San Francisco: Inc. Curran Associates, 2013. 422-430.

[23] Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008,1(1-2):1-305.

[24] Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Machine Learning, 1999, 37:183-233. [doi: 10.1023/A:1007665907178]

[25] Amari S. Natural gradient works efficiently in learning. Neural Computation, 1998,10(2):251-276. [doi: 10.1162/ 0899766983000 17746]

[26] Lai TL. Stochastic approximation. The Annals of Statistics, 2003,31(2):391-406. [doi: 10.1214/aos/1051027873]

[27] Hoffman DH, Blei DM, Bach FR. Online learning for latent dirichlet allocation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Proc. of the 24 th Annual Conf. on Neural Information Processing Systems 2010. San Francisco: Curran Associates Group, Inc., 2010. 856-864.

[28] Neal RM, Hinton GE. A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI, ed. Proc. of the Learning in Graphical Models. Cambridge: MIT Press, 1999. 335-368. [doi: 10.1007/978-94-011-5014-9_12]

[29] Bottou L. Online algorithms and stochastic approximations. In: Saad D, ed. Proc. of the Online Learning and Neural Networks. Cambridge: Cambridge University Press, 1998.

[30] Hoi S CH, Wang JL, Zhao PL. LIBOL: A library for online learning algorithms. Journal of Machine Learning Research, 2014,15: 495-499.

[31] Cappé O, Moulines E. Online EM algorithm for latent data models. Journal of the Royal Statistical Society B, 2009,71(3):593-613.

[32] Cappé O. Online Expectation-Maximisation. Mixtures: Estimation and Applications, 2011. 31-53.

[33] Papadopoulos F, Kitsak M, Serrano M, Boguna M, Krioukov D. Popularity versus similarity in growing networks. Nature, 2012,489(7417):537-540. [doi: 10.1038/nature11459]

[34] Nepusz T, Petrczi A, Ngyessy L, Bazso F. Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E, 2008,77(1):016107. [doi: 10.1103/PhysRevE.77.016107]

引用本文

柴变芳,贾彩燕,于剑.基于概率模型的大规模网络结构发现方法.软件学报,2014,25(12):2753-2766

复制

文章指标

点击次数:6070
下载次数: 9040
HTML阅读次数: 2449
引用次数: 0

历史

收稿日期:2014-04-14
最后修改日期:2014-08-21
录用日期:
在线发布日期: 2014-12-04
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码