HPL Approach for Heterogeneous Computer Platforms

doi:10.13328/j.cnki.jos.006005

微信服务号

微信订阅号

2025-4-11- 19

Home > Archive>Volume 32, Issue 8, 2021 >2329-2340. DOI:10.13328/j.cnki.jos.006005

PDF HTML XML Export Cite reminder

HPL Approach for Heterogeneous Computer Platforms
DOI:
                        10.13328/j.cnki.jos.006005
                    
Author:
                        SUN QiaoSUN Qiao
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SUN Jia-ChangSUN Jia-Chang
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Wen-JingMA Wen-Jing
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO Yu-WenZHAO Yu-Wen
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP303
Fund Project:National Key Research and Development Program of China (2018YFB0204404); Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200)

Article

Figures

Metrics

Reference [16]

Related [20]

Cited by

Materials

Comments

Abstract:

HPL (high performance Linpack) is a widely used benchmark for measuring computer performance. Over the decades, the practice of optimizing and tuning of HPL has constantly drawn great attention in both industrial and academic circle, to evaluate the performance of contemporary cutting-edge computer platforms. For current heterogeneous HPC platforms with multiple accelerating co-processors, an approach of high-performance HPL benchmark, Hetero-HPL, is proposed in this paper. In Hetero-HPL, the mapping between process set and (co-) processor set becomes adjustable, so that the computation within each computing node may avoid inter-process message exchange, and each important procedure of the HPL algorithm may make full use of the hardware resources of the computing node, such as memory, CPU cores, co-processors, and PCI-e bus etc.Without redundant computation and communication, the working set of Hetero-HPL is not restricted by the limit of pinned memory size in a single allocation, and is distributed in a way that the workload is balanced among all the co-processors and massive fine-grained parallelism can be exploited. On one experimental platform with four co-processors, Heter-HPL can reach an efficiency of 76.5% (the efficiency of function dgemm is 84%) in one computing node, and further experiment suggests that Hetero-HPL is also a feasible approach in distributed environment.

Key words:HPL (high performance Linpack);multi-device heterogeneous platform;parallel computing

Reference

[1] Dongarra JJ, Luszczek P, Petitet A. The LINPACK Benchmark:Past, present and future. Concurrency and Computation Practice & Experience, 2003,15(9):803-820.

[2] TOP-500 Official website. 2021. http://www.top500.org

[3] Gan XB, Hu YK, Liu J, Chi LH, Xu H, Gong CY, Li SG, Yan YH. Customizing the HPL for China accelerator. SCIENCE CHINA:Informtaion Sciences, 2018,61(4):Article No.042102.

[4] Van Zee FG, Van De Geijn RA. BLIS:A framework for rapidly instantiating BLAS functionality. ACM Trans. on Mathematical Software, 2013,41(3):1-33.

[5] Greer B, Henry G. High performance software on Intel Pentium Pro processors or micro-ops to TeraFLOPS. In:Proc. of the Supercomputing 1997 Conf. San Jose, 1997. 1-13.[doi:10.1145/509593.509639]

[6] Jia Y, Luszczek P, Dongarra J. Multi-GPU implementation of LU factorization. In:Proc. of the Int'l Conf. on Computational Science, 2012. 106-115.

[7] Bach M, Kretz M, Lindenstruth V, Rohr D. Optimized HPL for AMD GPU and multi-core CPU usage. Computer Science—Research and Development, 2011,26(3-4):153-164.

[8] Wang F, Yang CQ, Du YF, Chen J, Yi HZ, Xu WX. Optimizing Linpack benchmark on GPU-accelerated petascale supercomputer. Journal of Computer Science and Technology, 2011,26(5):854-865.[doi:10.1007/s11390-011-0184-1]

[9] Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet A, Chrysos G, Dubey G. Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel® Xeon Phi coprocessor. In:Proc. of the IEEE 27th Int'l Symp. on Parallel and Distributed Processing. 2013.[doi:10.1109/ipdps.2013.113]

[10] Fatica M. Accelerating Linpack with CUDA on heterogenous clusters. In:Proc. of the 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, 2009. 46-51.

[11] Bach M, Rohr D. Scaling DGEMM to multiple Cayman GPUs and Interlagos many-core CPUs for HPL. 2011. http://developer.amd.com/wordpress/media/2013/06/2909_1_final.pdf

[12] Womble D, Greenberg D, Wheat S, Riesen R. LU factorization and the LINPACK benchmark on the Intel Paragon. Sandia Technical Report, Sandia National Laboratories, 1994.

[13] Offical website. 2021. https://www.olcf.ornl.gov/summit/

[14] Chen RZ, Huang LB, Chen XH, Wang ZY. Optimizing HPL benchmark on multi-GPU clusters. Computer Science, 2013,40(3):107-110(in Chinese with English abstract).

附中文参考文献:

[14] 陈任之,黄立波,陈顼颢,王志英.单节点多GPU集群下HPL动态负载均衡优化.计算机科学,2013,40(3):107-110.

Get Citation

孙乔,孙家昶,马文静,赵玉文.面向异构计算机平台的HPL方案.软件学报,2021,32(8):2329-2340

Copy

Article Metrics

Abstract:1700
PDF: 5114
HTML: 3774
Cited by: 0

History

Received:August 22,2019
Revised:December 05,2019
Adopted:
Online: August 05,2021
Published: August 06,2021

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History