ChattyGraph: Highly Scalable Graph Computing System for Heterogeneous Multi Accelerators

doi:10.13328/j.cnki.jos.006732

微信服务号

微信订阅号

2025-4-24- 8

Home > Archive>Volume 34, Issue 4, 2023 >1977-1996. DOI:10.13328/j.cnki.jos.006732

PDF HTML XML Export Cite reminder

ChattyGraph: Highly Scalable Graph Computing System for Heterogeneous Multi Accelerators
DOI:
                        10.13328/j.cnki.jos.006732
                    
Author:
                        JIANG Xiao-BinJIANG Xiao-Bin
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIONG Yi-XiangXIONG Yi-Xiang
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG HengZHANG Heng
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Yan-JunWU Yan-Jun
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO ChenZHAO Chen
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Recently, with the increasing trend of data scale expansion and structure diversification, how to use the heterogeneous multi accelerators in modern link to provide a real-time and reliable parallel runtime environment for large-scale data processing has become a research hotspot in the field of high performance and database. Modern servers equipped with multi accelerators (GPU) has become the preferred high-performance platform for analyzing large-scale and irregular graph data. The overall performance of existing research designing graph computing systems and algorithms based on multi-GPU server architecture (such as breadth first traversal and shortest path algorithm) has been significantly better than that of multi-core CPU computing environment. However, the data transmission performance between multi-GPU of existing graph computing system is limited by PCI-E bandwidth and local delay, leading to being unable to achieve a linear growth trend of performance by increasing the number of GPU devices, and even serious delay jitter which cannot satisfy the high scalability requirements of large-scale graph parallel computing systems. After a series of benchmark experiments, it is found that the existing system has the following two types of defects. (1) The hardware architecture of the data link between modern GPU devices is rapidly updated (such as NVLink-V1 and NVLink-V2), and its link bandwidth and delay have been greatly improved. However, the existing systems are still limited by PCI-E for data communication, and cannot make full use of modern GPU link resources (including link topology, connectivity, and routing); (2) When dealing with irregular graph data, such systems often adopt single data movement strategy between devices, bringing a lot of unnecessary data synchronization overhead between GPU devices via PCI-E bus, resulting in excessive time-wait overhead of local computing. Therefore, it is urgent to make full use of various communication links between modern multi-GPU to design a highly scalable graph computing system. In order to achieve the high scalability of the multi-GPU graph computing system, a fine-grained communication based on hybrid perception is proposed to enhance the scalability of the multi-GPU graph computing system. It pre-awares the architecture link, uses the modular data link and communication strategy for different graph structured data, and finally selects the optimal data exchange method for large-scale graph data (structural data and application data). Based on above optimization strategies, this study proposes and designs a graph oriented parallel computing system via multi-GPU named ChattyGraph. By optimizing data buffer and multi-GPU collaborative computing with OpenMP and NCCL, ChattyGraph can adaptively and efficiently support various graph parallel computing applications and algorithms on multi-GPU HPC platform. Several experiments of various real-world graph data on 8-GPU NVIDIA DGX server show that ChattyGraph significantly improves graph computing efficiency and scalability, and outperforms other advanced competitors. The average computing efficiency is increased by 1.2×-1.5× and the average acceleration ratio is increased by 2×-3×, including WS-VR and Groute.

Key words:large scale;graph computing;multi coprocessor;bus;communication

Get Citation

蒋筱斌,熊轶翔,张珩,武延军,赵琛. ChattyGraph:面向异构多协处理器的高可扩展图计算系统.软件学报,2023,34(4):1977-1996

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 12,2021
Revised:April 20,2022
Adopted:
Online: July 22,2022
Published: April 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History