Optimization of HPL on Complex Heterogeneous Computing System
Author:
Affiliation:

Clc Number:

TP303

Fund Project:

Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200); National Key Research and Development Program of China (2018YFB0204404, 2016YFB0200601); National Natural Science Foundation of China (11871455, 11971016)

  • Article
  • | |
  • Metrics
  • |
  • Reference [10]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Nowadays, the mainstream supercomputers in the world adopt heterogeneous systems with accelerators more and more. The increase of float point computation performance of the accelerators requires other components to match its speed, including CPU, memory, bus, and network. High performance Linpack (HPL) is the traditional benchmark for high performance computers. Complex heterogeneous systems have brought both opportunities and challenges to the benchmarking with HPL. Therefore, for heterogeneous supercomputers, a new task partitioning scheme between the CPU and the accelerators is proposed, using the balance point theory to guide the optimization of HPL. For optimizing HPL, a look-ahead algorithm is proposed to coordinate the collaboration of CPU and the accelerators, as well as a contiguous row-swap algorithm, enabling the parallelism among CPU, accelerators, and network. Besides, new panel factorization and row-swap implementations have been designed for the system with accelerators, improving the effectiveness and efficiency of the usage of accelerators. With the configuration of 4 GPUs on each computing node, HPL efficiency of 79.51% on a single node.

    Reference
    [1] Dongarra J J, Luszczek P, Petitet A. The LINPACK Benchmark:Past, present and future. Concurrency and Computation:Practice and Experience, 2003,15(9):803-820.
    [2] Official website. 2020. http://www.top500.org
    [3] Kurzak J, Luszczek P, Faverge M, Jack Dongarra. Programming the LU factorization for a multicore system with accelerators. In:Proc. of the Int'l Conf. on High Performance Computing for Computational Science. Berlin, Heidelberg, 2013. 28-35.
    [4] Bach M, Kretz M, Lindenstruth V, Rohr D. Optimized HPL for AMD GPU and multi-core CPU usage. Computer Science—Research and Development, 2011,26:153-164.
    [5] Wang F, Yang CQ, Du YF, Chen J, Yi HZ, Xu WX. Optimizing Linpack benchmark on GPU-accelerated petascale supercomputer. Journal of Computer Science and Technology, 2011,26(5):854-865.
    [6] Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet A, Chrysos G,Dubey G. Design and implementation of the Linpack benchmark for single and multi-node systems based on intel xeon phi coprocessor. In:Proc. of the IEEE 27th Int'l Symp. on Parallel and Distributed Processing. Los Alamitos, 2013. 126-137.
    [7] Gan XB, Hu YK, Liu J, Chi LH, Xu H, Gong CY, Li SG, Yan YH. Customizing the HPL for China accelerator. Science China Information Sciences, 2018,61(4):Article No.042102.
    [8] Offical website. 2020. https://www.olcf.ornl.gov/summit/
    [9] Offical website. 2020. https://www.hpcwire.com/2019/06/19/summit-achieves-445-petaflops-on-new-hpl-ai-benchmark/
    [10] Offical website. 2020. http://www.netlib.org/benchmark/hpl/algorithm.html
    Cited by
Get Citation

黎雷生,杨文浩,马文静,张娅,赵慧,赵海涛,李会元,孙家昶.复杂异构计算系统HPL的优化.软件学报,2021,32(8):2307-2318

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 20,2019
  • Revised:December 05,2019
  • Online: August 05,2021
  • Published: August 06,2021
You are the first2032505Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063