面向GPU平台的复杂网络core分解方法研究

doi:10.13328/j.cnki.jos.005627

微信服务号

微信订阅号

2025年4月5日 21:38 星期六

首页 > 过刊浏览>2020年第31卷第4期 >1225-1239. DOI:10.13328/j.cnki.jos.005627

PDF HTML阅读 XML下载导出引用引用提醒

面向GPU平台的复杂网络core分解方法研究
DOI:
                        10.13328/j.cnki.jos.005627
                    
CSTR:
                        
                    
作者:
                        张珩张珩
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
崔强崔强
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
侯朋朋侯朋朋
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
武延军武延军
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
赵琛赵琛
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张珩(1990-),男,湖北天门人,博士,CCF学生会员,主要研究领域为分布式与并行计算,大数据处理,操作系统;武延军(1979-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为操作系统;崔强(1985-),男,博士,主要研究领域为数据挖掘,推荐算法,众包测试;赵琛(1967-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为编程语言,编译技术;侯朋朋(1985-),男,在读博士生,主要研究领域为操作系统,推荐系统.
通讯作者:zhangheng17@iscas.ac.cn
中图分类号:TP311
基金项目:国家重点研发计划（2018YFB0803600）；国家自然科学基金（61702488，61732020）

Accelerating Core Decomposition in Complex Network on GPUs

Author:

ZHANG Heng
ZHANG Heng
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
CUI Qiang
CUI Qiang
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
HOU Peng-Peng
HOU Peng-Peng
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
WU Yan-Jun
WU Yan-Jun
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Chen
ZHAO Chen
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key R&D Program of China (2018YFB0803600); National Natural Science Foundation of China (61702488, 61732020)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在复杂网络理论中，core分解是一种最基本的度量网络节点"重要性"并分析核心子图的方法.Core分解广泛应用于社交网络的用户行为分析、复杂网络的可视化、大型软件的代码静态分析等应用.随着复杂网络的图数据规模和复杂性的增大，现有研究工作基于多核CPU环境设计core分解并行算法，由于CPU核数和内存带宽的局限性，已经无法满足大数据量的高性能计算需求，严重影响了复杂网络的分析应用.通用GPU提供了1万以上线程数的高并行计算能力和高于100GB/s访存带宽，已被广泛应用于大规模图数据的高效并行分析，如广度优先遍历和最短路径算法等.为了实现更为高效的core分解，提出面向GPU平台下的复杂网络core分解的两种并行策略.第1种RLCore策略基于图遍历思想，利用GPU高并发计算能力对网络图结构自底向上遍历，逐步迭代设置各节点所属的core层；第2种ESCore策略基于局部收敛思想，对各节点从邻居节点当前值进行汇聚计算更新直至收敛.ESCore相比RLCore能够大大降低遍历过程中GPU线程更新同一节点的同步操作开销，而其算法的迭代次数受收敛率的影响.在真实网络图数据上的实验结果表明，所提出的两个策略在效率和扩展性方面能够大幅优于现有其他方法，相比单线程上的算法高达33.6倍性能提升，且遍历边的吞吐性能（TEPS）达到406万条/s，单轮迭代的ESCore的执行效率高于RLCore.

关键词:复杂网络;GPU;Core分解;大规模图数据;大数据处理

Abstract:

To analysis the complex networks, core decomposition is a basic and efficient strategy to distinguish the relative importance of nodes and to discover a special family of core subgraphs in networks. After core decomposition, every node in each k-core subgraph connects to other k neighbor nodes internally. The core decomposition has been widely applied in several application scenarios, e.g., user behavior analysis in social networks, visualization of complex networks, static analysis in large system software project, etc. With the increasing scale and complexity of networks, existing works, which mostly focus on the multi-core CPU-based implementation of core decomposition, cannot satisfy the high performance of core decomposition in large-scale complex networks. Meanwhile, GPU provides not only massive parallelism degree (up to 10 000 threads) but also efficient memory I/O bandwidth (approximately 100 GB/s), which makes it an excellent hardware platform for large graph structure analytic, such as BFS (breadth first search), SSSP (single source shortest path) algorithms. This study proposes two strategies to enhance the parallel performance of core decomposition on GPU-based platform. An algorithm, RLCore, is first presented which exploits GPU-based bottom-up traverse approach and recursively distinguishes the core levels of nodes by considering their degree and edges. Then, a second optimal algorithm is proposed to improve performance and scalability, namely ESCore, based on the locality property of core decomposition. In ESCore, nodes gather and update their core level values from their neighbors, until there is no update among nodes. Compared to RLCore, ESCore strategy reduces the synchronization overhead from multi-thread contention when scaling to massive parallelism, whereas the iteration number of ESCore is depended on the convergence of nodes. From the evaluation results, two proposed acceleration algorithms achieve maximum 4.06 billion TEPS (traversed edges per second), which corresponds to up to 33.6X speedup compared to a single threaded CPU execution.

Key words:complex network;GPU;core decomposition;large graph;big data processing

引用本文

张珩,崔强,侯朋朋,武延军,赵琛.面向GPU平台的复杂网络core分解方法研究.软件学报,2020,31(4):1225-1239

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2017-07-14
最后修改日期:2017-09-01
录用日期:
在线发布日期: 2020-04-16
出版日期: 2020-04-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码