Distributed Optimization and Implementation of Graph Embedding Algorithms

doi:10.13328/j.cnki.jos.006186

微信服务号

微信订阅号

2025-4-15- 3

Home > Archive>Volume 32, Issue 3, 2021 >636-649. DOI:10.13328/j.cnki.jos.006186

PDF HTML XML Export Cite reminder

Distributed Optimization and Implementation of Graph Embedding Algorithms
DOI:
                        10.13328/j.cnki.jos.006186
                    
Author:
                        ZHANG Wen-TaoZHANG Wen-Tao
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China;Department of Data Platform, Tencent Technology(Beijing) Co., Ltd., Beijing 100193, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YUAN BinYUAN Bin
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Zhi-PengZHANG Zhi-Peng
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CUI BinCUI Bin
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Key Research and Development Program of China (2018YFB1004403); National Natural Science Foundation of China (61832001); PKU-Tencent Joint Research Lab

Article

Figures

Metrics

Reference [22]

Related [20]

Cited by

Materials

Comments

Abstract:

With the advent of artificial intelligence, graph embedding techniques are more and more used to mine the information from graphs. However, graphs in real world are usually large and distributed graph embedding is needed. There are two main challenges in distributed graph embedding. (1) There exist many graph embedding methods and there is not a general framework for most of the embedding algorithms. (2) Existing distributed implementations of graph embedding suffer from poor scalability and perform bad on large graphs. To tackle the above two challenges, a general framework is firstly presented for distributed graph embedding. In detail, the process of sampling and training is separated in graph embedding such that the framework can describe different graph embedding methods. Second, a parameter server-based model partitioning strategy is proposed—the model is partitioned to both workers and servers and shuffling is used to ensure that there is no model exchange among workers. A prototype system is implemented on parameter server and solid experiments are conducted to show that partitioning-based strategy can get better performance than all baseline systems without loss of accuracy.

Key words:distributed machine learning;graph embedding;network optimization

Reference

[1] Zhang W, Miao X, Shao Y, Jiang J, Chen L, Ruas O, Cui B. Reliable data distillation on graph convolutional network. In:Proc. of the 2020 ACM SIGMOD Int'l Conf. on Management of Data. 2020. 1399-1414.

[2] Wu S, Zhang Y, Gao C, Bian K, Cui B. GARG:Anonymous recommendation of point-of-interest in mobile networks by graph convolution network. Data Science and Engineering, 2020,5(4):433-447.

[3] He J, Liu HY, Zheng YQ, Shu T, He W, Du XY. Bi-labeled LDA:Inferring interest tags for non-famous users in social network. Data Science and Engineering, 2020,5(1):27-47.

[4] Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 613-624.

[5] Sikos LF, Philp D. Provenance-aware knowledge representation:A survey of data models and contextualized knowledge graphs. Data Science and Engineering, 2020,5(3):293-316.

[6] Perozzi B, Al-Rfou R, Skiena S. Deepwalk:Online learning of social representations. In:Proc. of the 20th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2014. 701-710.

[7] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line:Large-scale information network embedding. In:Proc. of the 24th Int'l Conf. on World Wide Web. 2015. 1067-1077.

[8] Zhou C, Liu Y, Liu X, Liu Z, Gao J. Scalable graph embedding for asymmetric proximity. In:Proc. of the 31st AAAI Conf. on Artificial Intelligence. 2017. 2942-2948.

[9] Grover A, Leskovec J. node2vec:Scalable feature learning for networks. In:Proc. of the 22nd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2016. 855-864.

[10] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In:Proc. of the Advances in Neural Information Processing Systems. 2013. 3111-3119.

[11] Goldberg Y, Levy O. word2vec explained:Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.

[12] Jiang J, Yu L, Jiang J, Liu Y, Cui B. Angel:A new large-scale machine learning system. National Science Review, 2018,5(2):216-236.

[13] Jiang J, Xiao P, Yu L, Li X, Cheng J, Miao X, Cui B. PSGraph:How tencent trains extremely large-scale graphs with spark? In:Proc. of the 2020 IEEE 36th Int'l Conf. on Data Engineering (ICDE). IEEE, 2020. 1549-1557.

[14] Yang H. Aligraph:A comprehensive graph neural network platform. In:Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. 2019. 3165-3166.

[15] Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y. A closer look at skip-gram modelling. In:Proc. of LREC. 2006. 1222-1225.

[16] Smola A, Narayanamurthy S. An architecture for parallel topic models. Proc. of the VLDB Endowment, 2010,3(1-2):703-710.

[17] Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Le QV. Large scale distributed deep networks. In:Proc. of the Advances in Neural Information Processing Systems. 2012. 1223-1231.

[18] Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Yu Y. Petuum:A new platform for distributed machine learning on big data. IEEE Trans. on Big Data, 2015,1(2):49-67.

[19] Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Su BY. Scaling distributed machine learning with the parameter server. In:Proc. of the 11th USENIX Symp. on Operating Systems Design and Implementation (OSDI 2014). 2014. 583-598.

[20] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark:Cluster computing with working sets. HotCloud, 2010, 10(10-10):95.

[21] Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In:Proc. of the 26th IEEE Symp. on Mass Storage Systems and Technologies (MSST). IEEE, 2010. 1-10.

[22] Recht B, Re C, Wright S, Niu F. Hogwild:A lock-free approach to parallelizing stochastic gradient descent. In:Proc. of the Advances in Neural Information Processing Systems. 2011. 693-701.

Get Citation

张文涛,苑斌,张智鹏,崔斌.图嵌入算法的分布式优化与实现.软件学报,2021,32(3):636-649

Copy

Article Metrics

Abstract:2378
PDF: 5526
HTML: 3640
Cited by: 0

History

Received:August 23,2020
Revised:September 03,2020
Adopted:
Online: January 21,2021
Published: March 06,2021

You are the first2035071Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History