Parallel Algorithm and Efficient Implementation of HPCG on Domestic Heterogeneous Systems

doi:10.13328/j.cnki.jos.006006

微信服务号

微信订阅号

2025-4-9- 11

Home > Archive>Volume 32, Issue 8, 2021 >2341-2351. DOI:10.13328/j.cnki.jos.006006

PDF HTML XML Export Cite reminder

Parallel Algorithm and Efficient Implementation of HPCG on Domestic Heterogeneous Systems
DOI:
                        10.13328/j.cnki.jos.006006
                    
Author:
                        LIU Fang-FangLIU Fang-Fang
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Zhi-JunWANG Zhi-Jun
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG QuanWANG Quan
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Li-XinWU Li-Xin
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Wen-JingMA Wen-Jing
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG ChaoYANG Chao
School of Mathematical Sciences, Peking University, Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SUN Jia-ChangSUN Jia-Chang
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP303
Fund Project:Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200); National Key Research and Development Program of China (2018YFB0204404, 2016YFB0200603)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

HPCG benchmark is a new standard for supercomputer ranking. This benchmark is used mainly for evaluating how fast a supercomputer is able to solve a large scale sparse linear system, which is closer to real applications, and has attracted extensive attention recently. Research of parallel HPCG on domestic heterogeneous many-core supercomputers is very important, not only to improve the HPCG ranking of Chinese supercomputers, but also to provide reference of parallel algorithm and optimization techniques for many applications. This work studies parallelization and optimization of HPCG on a domestically produced complex heterogeneous supercomputer, leveraging blocked graph coloring algorithm for parallelism exploration for the first time on this system, and proposes a graph coloring algorithm for structured grids. The parallelism produced by this algorithm is higher than the traditional JPL and CC algorithm, with better coloring quality. With this algorithm, successfully reduced the iteration number of HPCG by 3 times, and the total performance is improved by 6%. This study also analyzes the data transfer cost of each component in the complex heterogeneous system, providing a task partitioning method, which is more suitable for HPCG, and the neighbor communication cost in SpMV and SymGS is hidden by inner-outer region partitioning. In the whole-system test, an HPCG performance of 1.67% of the peek GFLOPS of the system is achieved, compared to single-node performance, the weak-scaling efficiency on the whole system has reached 92%.

Key words:HPCG;domestic supercomputer;graph coloring;SpMV;SymGS

Get Citation

刘芳芳,王志军,汪荃,吴丽鑫,马文静,杨超,孙家昶.国产异构系统上的HPCG并行算法及高效实现.软件学报,2021,32(8):2341-2351

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 22,2019
Revised:December 05,2019
Adopted:
Online: August 05,2021
Published: August 06,2021

You are the first2034066Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History