大数据下基于异步累积更新的高效P-Rank计算方法

doi:10.13328/j.cnki.jos.004637

微信服务号

微信订阅号

2025年5月1日 17:03 星期四

首页 > 过刊浏览>2014年第25卷第9期 >2136-2148. DOI:10.13328/j.cnki.jos.004637

PDF HTML阅读 XML下载导出引用引用提醒

大数据下基于异步累积更新的高效P-Rank计算方法
DOI:
                        10.13328/j.cnki.jos.004637
                    
CSTR:
                        
                    
作者:
                        王旭丛王旭丛
中国人民大学 信息学院 计算机系, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
李翠平李翠平
中国人民大学 信息学院 数据仓库与商务智能实验室, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
陈红陈红
中国人民大学 信息学院 数据仓库与商务智能实验室, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61272137, 61033010, 61202114); 国家高技术研究发展计划(863)(2014AA015204); 国家基础研究发展计划(973)(2012CB316205); 国家社会科学基金(12&ZD220); 中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)(10XNI018)

High-Efficiency P-Rank Computation Through Asynchronous Accumulative Updates in Big Data Environment

Author:

WANG Xu-Cong
WANG Xu-Cong
Department of Computer Science, School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LI Cui-Ping
LI Cui-Ping
Data Warehouse and Business Intelligence Laboratory, School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Hong
CHEN Hong
Data Warehouse and Business Intelligence Laboratory, School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

P-Rank是SimRank的扩展形式,也是一种相似度度量方法,被用来计算网络中任意两个结点的相似性.不同于SimRank只考虑结点的入度信息,P-Rank还加入了结点的出度信息,从而更加客观准确地评价结点间的相似程度.随着大数据时代的到来,P-Rank需要处理的数据日益增大.使用MapReduce等分布式模型实现大规模P-Rank迭代计算的方法,本质上是一种同步迭代方法,不可避免地具有同步迭代方法的缺点:迭代时间(尤其是迭代过程中处理器等待的时间)长,计算速度慢,因此效率低下.为了解决这一问题,采用了一种迭代计算方法——异步累积更新算法.这个算法实现了异步计算,减少了计算过程处理器结点的等待时间,提高了计算速度,节省了时间开销.从异步的角度实现了P-Rank算法,将异步累积更新算法应用在了P-Rank上,并进行了对比实验.实验结果表明该算法有效地提高了计算收敛速度.

关键词:异步累积更新;大数据;相似度;P-Rank;大规模计算

Abstract:

P-Rank enriches the traditional similarity measure, SimRank. It is also a method to measure the similarity between two objects in graph model. Different from SimRank which only considers the in-link information, P-Rank also takes the out-link information into consideration. Consequently, P-Rank could effectively and comprehensively measure “how similar two nodes are”. P-Rank is applied widely in graph mining. With the arrival of big-data era, the data scale which P-Rank processes is increasing. The existing methods which implement P-Rank, such as the MapReduce model, are essentially synchronous iterative methods. These methods have some shortcomings in common: the iterative time, especially the waiting time of processors during iterative computing, is long, thus leading to very low efficiency. To solve this problem, this paper uses a new iterative method—the Asynchronous Accumulative Update method. Different from the traditional synchronous methods, this method successfully implementes asynchronous computations and as a result reduces the waiting time of processors during computing. This paper implements P-Rank using the asynchronous accumulative update method, and the experiment results indicate that this method can effectively improve the computation speed.

Key words:asynchronous accumulative update;big data;similarity;P-Rank;large-scale computation

引用本文

王旭丛,李翠平,陈红.大数据下基于异步累积更新的高效P-Rank计算方法.软件学报,2014,25(9):2136-2148

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2014-01-24
最后修改日期:2014-04-30
录用日期:
在线发布日期: 2014-09-09
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码