GPPR：跨域分布式个性化PageRank算法

doi:10.13328/j.cnki.jos.007072

微信服务号

微信订阅号

2025年5月1日 19:37 星期四

首页 > 过刊浏览>2024年第35卷第3期 >1090-1106. DOI:10.13328/j.cnki.jos.007072

PDF HTML阅读 XML下载导出引用引用提醒

GPPR：跨域分布式个性化PageRank算法
DOI:
                        10.13328/j.cnki.jos.007072
                    
CSTR:
                        
                    
作者:
                        陈子俊陈子俊
东北大学 计算机科学与工程学院, 辽宁 沈阳 110169
在期刊界中查找
在百度中查找
在本站中查找
马德龙马德龙
东北大学 计算机科学与工程学院, 辽宁 沈阳 110169
在期刊界中查找
在百度中查找
在本站中查找
王一舒王一舒
东北大学 计算机科学与工程学院, 辽宁 沈阳 110169
在期刊界中查找
在百度中查找
在本站中查找
袁野袁野
北京理工大学 计算机学院, 北京 100081
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家重点研发计划(2022YFB2702100);国家自然科学基金(61932004,62225203,U21A20516);中央高校基本科研业务专项资金(N232405-16)

GPPR: Cross-geo-distributed Personalized PageRank Algorithm

Author:

CHEN Zi-Jun
CHEN Zi-Jun
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
在期刊界中查找
在百度中查找
在本站中查找
MA De-Long
MA De-Long
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Yi-Shu
WANG Yi-Shu
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
在期刊界中查找
在百度中查找
在本站中查找
YUAN Ye
YUAN Ye
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

个性化PageRank作为大图分析中的基本算法,在搜索引擎、社交推荐、社区检测等领域具有广泛的应用,一直是研究者们关注的热点问题.现有的分布式个性化PageRank算法均假设所有数据位于同一地理位置,且数据所在的计算节点之间具有相同的网络环境.然而在现实世界中,这些数据可能分布在跨洲的多个数据中心中,这些跨域分布(cross-geo-distributed)的数据中心之间通过广域网连接,存在网络带宽异构、硬件差异巨大、通信费用高昂等特点.分布式个性化PageRank算法需要多轮迭代,并在全局图上进行随机游走.因此,现有的分布式个性化PageRank算法不适用于跨域环境.针对此问题,提出了GPPR (cross-geo-distributed personalized PageRank)算法.该算法首先对跨域环境中的大图数据进行预处理,采用启发式算法映射图数据,以降低网络带宽异构对算法迭代速度的影响;其次,GPPR改进了随机游走方式,提出了基于概率的Push算法,通过减少工作节点之间传输数据的带宽负载,进一步减少算法所需的迭代次数.基于Spark框架实现了GPPR算法,并在阿里云中构建真实的跨域环境,在8个开源大图数据上,与现有的多个代表性分布式个性化PageRank算法进行了对比实验.结果显示,GPPR的通信数据量在跨域环境中比其他算法平均减少30%.在算法运行效率方面,GPPR比其他算法平均提升了2.5倍.

关键词:跨域分布式;个性化PageRank;近似计算

Abstract:

Personalized PageRank, as a basic algorithm in large graph analysis, has a wide range of applications in search engines, social recommendation, community detection, and other fields, and has been a hot problem of interest to researchers. The existing distributed personalized PageRank algorithms assume that all data are located in the same geographic location and the network environment is the same among the computing nodes where the data are located. However, in the real world, these data may be distributed in multiple data centers across continents, and these cross-geo-distributed data centers are connected to each other through WANs, which are characterized by heterogeneous network bandwidth, huge hardware differences, and high communication costs. The distributed personalized PageRank algorithm requires multiple iterations and random wandering on the global graph. Therefore, the existing distributed personalized PageRank algorithms are not applicable to the cross-geo-distributed environment. To address this problem, the GPPR (cross-geo- distributed personalized PageRank) algorithm is proposed in this study. The algorithm first preprocesses the big graph data in the cross-geo-distributed environment and maps the graph data by using a heuristic algorithm to reduce the impact of network bandwidth heterogeneity on the iteration speed of the algorithm. Secondly, GPPR improves the random wandering approach and proposes a probability-based push algorithm to further reduce the number of iterations required by the algorithm by reducing the bandwidth load of transmitting data between working nodes. The GPPR algorithm is implemented based on the Spark framework and a real cross-geo-distributed environment in AliCloud is built to conduct experiments on eight open-source big graph data compared with several existing representative distributed personalized PageRank algorithms. The results show that the communication data volume of GPPR is reduced by 30% on average in the cross-geo-distributed environment compared with other algorithms. In terms of algorithm running efficiency, GPPR improves by an average of 2.5 times compared to other algorithms.

Key words:cross-geo-distributed;personalized PageRank;approximate calculation

引用本文

陈子俊,马德龙,王一舒,袁野. GPPR：跨域分布式个性化PageRank算法.软件学报,2024,35(3):1090-1106

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-07-17
最后修改日期:2023-09-05
录用日期:
在线发布日期: 2023-11-08
出版日期: 2024-03-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码