Distributed Optimization and Implementation of Graph Embedding Algorithms
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (2018YFB1004403); National Natural Science Foundation of China (61832001); PKU-Tencent Joint Research Lab

  • Article
  • | |
  • Metrics
  • |
  • Reference [22]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    With the advent of artificial intelligence, graph embedding techniques are more and more used to mine the information from graphs. However, graphs in real world are usually large and distributed graph embedding is needed. There are two main challenges in distributed graph embedding. (1) There exist many graph embedding methods and there is not a general framework for most of the embedding algorithms. (2) Existing distributed implementations of graph embedding suffer from poor scalability and perform bad on large graphs. To tackle the above two challenges, a general framework is firstly presented for distributed graph embedding. In detail, the process of sampling and training is separated in graph embedding such that the framework can describe different graph embedding methods. Second, a parameter server-based model partitioning strategy is proposed—the model is partitioned to both workers and servers and shuffling is used to ensure that there is no model exchange among workers. A prototype system is implemented on parameter server and solid experiments are conducted to show that partitioning-based strategy can get better performance than all baseline systems without loss of accuracy.

    Reference
    [1] Zhang W, Miao X, Shao Y, Jiang J, Chen L, Ruas O, Cui B. Reliable data distillation on graph convolutional network. In:Proc. of the 2020 ACM SIGMOD Int'l Conf. on Management of Data. 2020. 1399-1414.
    [2] Wu S, Zhang Y, Gao C, Bian K, Cui B. GARG:Anonymous recommendation of point-of-interest in mobile networks by graph convolution network. Data Science and Engineering, 2020,5(4):433-447.
    [3] He J, Liu HY, Zheng YQ, Shu T, He W, Du XY. Bi-labeled LDA:Inferring interest tags for non-famous users in social network. Data Science and Engineering, 2020,5(1):27-47.
    [4] Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 613-624.
    [5] Sikos LF, Philp D. Provenance-aware knowledge representation:A survey of data models and contextualized knowledge graphs. Data Science and Engineering, 2020,5(3):293-316.
    [6] Perozzi B, Al-Rfou R, Skiena S. Deepwalk:Online learning of social representations. In:Proc. of the 20th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2014. 701-710.
    [7] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line:Large-scale information network embedding. In:Proc. of the 24th Int'l Conf. on World Wide Web. 2015. 1067-1077.
    [8] Zhou C, Liu Y, Liu X, Liu Z, Gao J. Scalable graph embedding for asymmetric proximity. In:Proc. of the 31st AAAI Conf. on Artificial Intelligence. 2017. 2942-2948.
    [9] Grover A, Leskovec J. node2vec:Scalable feature learning for networks. In:Proc. of the 22nd ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2016. 855-864.
    [10] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In:Proc. of the Advances in Neural Information Processing Systems. 2013. 3111-3119.
    [11] Goldberg Y, Levy O. word2vec explained:Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.
    [12] Jiang J, Yu L, Jiang J, Liu Y, Cui B. Angel:A new large-scale machine learning system. National Science Review, 2018,5(2):216-236.
    [13] Jiang J, Xiao P, Yu L, Li X, Cheng J, Miao X, Cui B. PSGraph:How tencent trains extremely large-scale graphs with spark? In:Proc. of the 2020 IEEE 36th Int'l Conf. on Data Engineering (ICDE). IEEE, 2020. 1549-1557.
    [14] Yang H. Aligraph:A comprehensive graph neural network platform. In:Proc. of the 25th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. 2019. 3165-3166.
    [15] Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y. A closer look at skip-gram modelling. In:Proc. of LREC. 2006. 1222-1225.
    [16] Smola A, Narayanamurthy S. An architecture for parallel topic models. Proc. of the VLDB Endowment, 2010,3(1-2):703-710.
    [17] Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Le QV. Large scale distributed deep networks. In:Proc. of the Advances in Neural Information Processing Systems. 2012. 1223-1231.
    [18] Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Yu Y. Petuum:A new platform for distributed machine learning on big data. IEEE Trans. on Big Data, 2015,1(2):49-67.
    [19] Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Su BY. Scaling distributed machine learning with the parameter server. In:Proc. of the 11th USENIX Symp. on Operating Systems Design and Implementation (OSDI 2014). 2014. 583-598.
    [20] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark:Cluster computing with working sets. HotCloud, 2010, 10(10-10):95.
    [21] Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In:Proc. of the 26th IEEE Symp. on Mass Storage Systems and Technologies (MSST). IEEE, 2010. 1-10.
    [22] Recht B, Re C, Wright S, Niu F. Hogwild:A lock-free approach to parallelizing stochastic gradient descent. In:Proc. of the Advances in Neural Information Processing Systems. 2011. 693-701.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张文涛,苑斌,张智鹏,崔斌.图嵌入算法的分布式优化与实现.软件学报,2021,32(3):636-649

Copy
Share
Article Metrics
  • Abstract:2378
  • PDF: 5526
  • HTML: 3640
  • Cited by: 0
History
  • Received:August 23,2020
  • Revised:September 03,2020
  • Online: January 21,2021
  • Published: March 06,2021
You are the first2035071Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063