GPPR: Cross-geo-distributed Personalized PageRank Algorithm
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Personalized PageRank, as a basic algorithm in large graph analysis, has a wide range of applications in search engines, social recommendation, community detection, and other fields, and has been a hot problem of interest to researchers. The existing distributed personalized PageRank algorithms assume that all data are located in the same geographic location and the network environment is the same among the computing nodes where the data are located. However, in the real world, these data may be distributed in multiple data centers across continents, and these cross-geo-distributed data centers are connected to each other through WANs, which are characterized by heterogeneous network bandwidth, huge hardware differences, and high communication costs. The distributed personalized PageRank algorithm requires multiple iterations and random wandering on the global graph. Therefore, the existing distributed personalized PageRank algorithms are not applicable to the cross-geo-distributed environment. To address this problem, the GPPR (cross-geo- distributed personalized PageRank) algorithm is proposed in this study. The algorithm first preprocesses the big graph data in the cross-geo-distributed environment and maps the graph data by using a heuristic algorithm to reduce the impact of network bandwidth heterogeneity on the iteration speed of the algorithm. Secondly, GPPR improves the random wandering approach and proposes a probability-based push algorithm to further reduce the number of iterations required by the algorithm by reducing the bandwidth load of transmitting data between working nodes. The GPPR algorithm is implemented based on the Spark framework and a real cross-geo-distributed environment in AliCloud is built to conduct experiments on eight open-source big graph data compared with several existing representative distributed personalized PageRank algorithms. The results show that the communication data volume of GPPR is reduced by 30% on average in the cross-geo-distributed environment compared with other algorithms. In terms of algorithm running efficiency, GPPR improves by an average of 2.5 times compared to other algorithms.

    Reference
    Related
    Cited by
Get Citation

陈子俊,马德龙,王一舒,袁野. GPPR:跨域分布式个性化PageRank算法.软件学报,2024,35(3):1090-1106

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 17,2023
  • Revised:September 05,2023
  • Adopted:
  • Online: November 08,2023
  • Published: March 06,2024
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063