Graph Neural Network Training Acceleration for Multi-GPUs

doi:10.13328/j.cnki.jos.006647

微信服务号

微信订阅号

2025-6-5- 8

Home > Archive>Volume 34, Issue 9, 2023 >4407-4420. DOI:10.13328/j.cnki.jos.006647

PDF HTML XML Export Cite reminder

Graph Neural Network Training Acceleration for Multi-GPUs
DOI:
                        10.13328/j.cnki.jos.006647
                    
Author:
                        MIAO Xu-PengMIAO Xu-Peng
School of Computer Science, Peking University, Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Yu-JieWANG Yu-Jie
School of Computer Science, Peking University, Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN JiaSHEN Jia
School of Computer Science, Peking University, Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHAO Ying-XiaSHAO Ying-Xia
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CUI BinCUI Bin
School of Computer Science, Peking University, Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, graph neural networks (GNNs) have attracted wide attention due to their powerful and flexible representation ability. Considering the increasing scale of graph data and the limitation of the video memory capacity, it becomes more challenging to train GNNs with traditional general deep learning systems, and such training cannot give full play to the performance of GPU devices. To achieve efficient use of GPU hardware for GNN training is one of the important research issues in this field. Traditional approaches employ sparse matrix multiplication for the calculation process of GNNs. When the video memory capacity of GPU devices is limited, the computation tasks are distributed to each device by distributed matrix multiplication. Their shortcomings are mainly as follows: (1) Sparse matrix multiplication ignores the sparse distribution of the graph data, which results in low computation efficiency. (2) These methods ignore the computation and memory access characteristics of GPU and fail to utilize the hardware resources. To improve the training efficiency, some studies propose to reduce the costs of each iteration and storage requirements through graph sampling techniques, which also support flexible distributed scaling. Due to the stochastics and variance, however, these methods often affect the model accuracy. Therefore, this study proposes a high-performance GNN training framework for multi-GPUs. Different GNN partition strategies for multi-GPUs are explored, and the influence of different graph ordering patterns on the GPU performance during the calculation process of GNNs is investigated to ensure the accuracy of the model. Moreover, block-sparsity-aware optimization methods are put forward for GPU memory access. The prototype system is achieved using C++ and CuDNN. The experiments on four large-scale GNN datasets demonstrate that (1) the graph re-ordering method improves the cache hit rate of GPU by around 40% and doubles the computation speedup; (2) compared to the existing system DGL, the proposed system achieves a total speedup of 5.8x.

Key words:graph neural network (GNN);distributed computation;memory optimization;GPU acceleration

Get Citation

苗旭鹏,王驭捷,沈佳,邵蓥侠,崔斌.面向多GPU的图神经网络训练加速.软件学报,2023,34(9):4407-4420

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 02,2021
Revised:September 26,2021
Adopted:
Online: January 04,2023
Published: September 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History