国产异构系统上的HPCG并行算法及高效实现

doi:10.13328/j.cnki.jos.006006

微信服务号

微信订阅号

2025年8月15日 5:20 星期五

首页 > 过刊浏览>2021年第32卷第8期 >2341-2351. DOI:10.13328/j.cnki.jos.006006

PDF HTML阅读 XML下载导出引用引用提醒

国产异构系统上的HPCG并行算法及高效实现
DOI:
                        10.13328/j.cnki.jos.006006
                    
CSTR:
                        
                    
作者:
                        刘芳芳刘芳芳
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;中国科学院大学, 北京 100049;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
王志军王志军
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
汪荃汪荃
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
吴丽鑫吴丽鑫
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
马文静马文静
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
杨超杨超
北京大学 数学科学学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
孙家昶孙家昶
中国科学院 软件研究所 并行软件与计算科学实验室, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:刘芳芳(1982-),女,博士,正高级工程师,CCF专业会员,主要研究领域为高性能扩展数学库,稀疏迭代解法器,异构众核并行.
马文静(1981-),女,博士,副研究员,CCF专业会员,主要研究领域为高性能计算.
王志军(1995-),男,硕士,主要研究领域为高性能计算,并行计算.
杨超(1979-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为高性能计算,科学与工程计算.
汪荃(1996-),女,硕士,主要研究领域为并行计算.
孙家昶(1942-),男,研究员,博士生导师,主要研究领域为科学与工程计算的方法、理论与应用,并行计算.
吴丽鑫(1994-),女,硕士,主要研究领域为并行计算.
通讯作者:马文静,E-mail:wenjing@iscas.ac.cn
中图分类号:TP303
基金项目:中国科学院战略性先导科技专项（C类）（XDC01030200）；国家重点研发计划（2018YFB0204404，2016YFB0200603）

Parallel Algorithm and Efficient Implementation of HPCG on Domestic Heterogeneous Systems

Author:

LIU Fang-Fang
LIU Fang-Fang
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Zhi-Jun
WANG Zhi-Jun
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Quan
WANG Quan
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
WU Li-Xin
WU Li-Xin
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
MA Wen-Jing
MA Wen-Jing
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Chao
YANG Chao
School of Mathematical Sciences, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Jia-Chang
SUN Jia-Chang
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200); National Key Research and Development Program of China (2018YFB0204404, 2016YFB0200603)

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

HPCG基准测试程序是一种新的超级计算机排名度量标准.该测试基准主要用于衡量超级计算机解决大规模稀疏线性系统的能力，更贴近实际应用，近年来广受关注.基于国产超级计算机研究异构众核并行HPCG软件具有非常重要的意义，其不仅可以提升国产超级计算机HPCG的排名，还对很多应用提供了并行算法、优化技术等方面的参考.面向某国产复杂异构超级计算机开展研究，首先采用了分块图着色算法对HPCG进行并行，并提出一种适用于结构化网格的图着色算法.该算法并行性能高于传统的JPL、CC等算法，且着色质量高，运用于HPCG后，迭代次数减少了3次，整体性能提升了6%.分析了复杂异构系统各个部件传输的开销，提出一套更适用于HPCG的任务划分方法，并从稀疏矩阵存储格式、稀疏矩阵重排、访存等角度开展了细粒度的优化.在多进程计算时，还采用内外区划分算法将核心函数SpMV、SymGS中的邻居通信操作进行了隐藏.最终整机测试时，性能达到了国产超级计算机峰值性能的1.67%，与单节点相比，整机弱可扩展性并行效率达到了92%.

关键词:HPCG;国产超级计算机;图着色;SpMV;SymGS

Abstract:

HPCG benchmark is a new standard for supercomputer ranking. This benchmark is used mainly for evaluating how fast a supercomputer is able to solve a large scale sparse linear system, which is closer to real applications, and has attracted extensive attention recently. Research of parallel HPCG on domestic heterogeneous many-core supercomputers is very important, not only to improve the HPCG ranking of Chinese supercomputers, but also to provide reference of parallel algorithm and optimization techniques for many applications. This work studies parallelization and optimization of HPCG on a domestically produced complex heterogeneous supercomputer, leveraging blocked graph coloring algorithm for parallelism exploration for the first time on this system, and proposes a graph coloring algorithm for structured grids. The parallelism produced by this algorithm is higher than the traditional JPL and CC algorithm, with better coloring quality. With this algorithm, successfully reduced the iteration number of HPCG by 3 times, and the total performance is improved by 6%. This study also analyzes the data transfer cost of each component in the complex heterogeneous system, providing a task partitioning method, which is more suitable for HPCG, and the neighbor communication cost in SpMV and SymGS is hidden by inner-outer region partitioning. In the whole-system test, an HPCG performance of 1.67% of the peek GFLOPS of the system is achieved, compared to single-node performance, the weak-scaling efficiency on the whole system has reached 92%.

Key words:HPCG;domestic supercomputer;graph coloring;SpMV;SymGS

引用本文

刘芳芳,王志军,汪荃,吴丽鑫,马文静,杨超,孙家昶.国产异构系统上的HPCG并行算法及高效实现.软件学报,2021,32(8):2341-2351

复制

文章指标

点击次数:1960
下载次数: 5849
HTML阅读次数: 4969
引用次数: 0

历史

收稿日期:2019-08-22
最后修改日期:2019-12-05
录用日期:
在线发布日期: 2021-08-05
出版日期: 2021-08-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码