面向GPU平台的复杂网络core分解方法研究
作者:
作者单位:

作者简介:

张珩(1990-),男,湖北天门人,博士,CCF学生会员,主要研究领域为分布式与并行计算,大数据处理,操作系统;武延军(1979-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为操作系统;崔强(1985-),男,博士,主要研究领域为数据挖掘,推荐算法,众包测试;赵琛(1967-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为编程语言,编译技术;侯朋朋(1985-),男,在读博士生,主要研究领域为操作系统,推荐系统.

通讯作者:

zhangheng17@iscas.ac.cn

中图分类号:

TP311

基金项目:

国家重点研发计划(2018YFB0803600);国家自然科学基金(61702488,61732020)


Accelerating Core Decomposition in Complex Network on GPUs
Author:
Affiliation:

Fund Project:

National Key R&D Program of China (2018YFB0803600); National Natural Science Foundation of China (61702488, 61732020)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在复杂网络理论中,core分解是一种最基本的度量网络节点"重要性"并分析核心子图的方法.Core分解广泛应用于社交网络的用户行为分析、复杂网络的可视化、大型软件的代码静态分析等应用.随着复杂网络的图数据规模和复杂性的增大,现有研究工作基于多核CPU环境设计core分解并行算法,由于CPU核数和内存带宽的局限性,已经无法满足大数据量的高性能计算需求,严重影响了复杂网络的分析应用.通用GPU提供了1万以上线程数的高并行计算能力和高于100GB/s访存带宽,已被广泛应用于大规模图数据的高效并行分析,如广度优先遍历和最短路径算法等.为了实现更为高效的core分解,提出面向GPU平台下的复杂网络core分解的两种并行策略.第1种RLCore策略基于图遍历思想,利用GPU高并发计算能力对网络图结构自底向上遍历,逐步迭代设置各节点所属的core层;第2种ESCore策略基于局部收敛思想,对各节点从邻居节点当前值进行汇聚计算更新直至收敛.ESCore相比RLCore能够大大降低遍历过程中GPU线程更新同一节点的同步操作开销,而其算法的迭代次数受收敛率的影响.在真实网络图数据上的实验结果表明,所提出的两个策略在效率和扩展性方面能够大幅优于现有其他方法,相比单线程上的算法高达33.6倍性能提升,且遍历边的吞吐性能(TEPS)达到406万条/s,单轮迭代的ESCore的执行效率高于RLCore.

    Abstract:

    To analysis the complex networks, core decomposition is a basic and efficient strategy to distinguish the relative importance of nodes and to discover a special family of core subgraphs in networks. After core decomposition, every node in each k-core subgraph connects to other k neighbor nodes internally. The core decomposition has been widely applied in several application scenarios, e.g., user behavior analysis in social networks, visualization of complex networks, static analysis in large system software project, etc. With the increasing scale and complexity of networks, existing works, which mostly focus on the multi-core CPU-based implementation of core decomposition, cannot satisfy the high performance of core decomposition in large-scale complex networks. Meanwhile, GPU provides not only massive parallelism degree (up to 10 000 threads) but also efficient memory I/O bandwidth (approximately 100 GB/s), which makes it an excellent hardware platform for large graph structure analytic, such as BFS (breadth first search), SSSP (single source shortest path) algorithms. This study proposes two strategies to enhance the parallel performance of core decomposition on GPU-based platform. An algorithm, RLCore, is first presented which exploits GPU-based bottom-up traverse approach and recursively distinguishes the core levels of nodes by considering their degree and edges. Then, a second optimal algorithm is proposed to improve performance and scalability, namely ESCore, based on the locality property of core decomposition. In ESCore, nodes gather and update their core level values from their neighbors, until there is no update among nodes. Compared to RLCore, ESCore strategy reduces the synchronization overhead from multi-thread contention when scaling to massive parallelism, whereas the iteration number of ESCore is depended on the convergence of nodes. From the evaluation results, two proposed acceleration algorithms achieve maximum 4.06 billion TEPS (traversed edges per second), which corresponds to up to 33.6X speedup compared to a single threaded CPU execution.

    参考文献
    相似文献
    引证文献
引用本文

张珩,崔强,侯朋朋,武延军,赵琛.面向GPU平台的复杂网络core分解方法研究.软件学报,2020,31(4):1225-1239

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2017-07-14
  • 最后修改日期:2017-09-01
  • 录用日期:
  • 在线发布日期: 2020-04-16
  • 出版日期: 2020-04-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号