Optimizing Method for Improving the Performance of MPI Broadcast under Unbalanced Process Arrival Patterns

doi:10.3724/SP.J.1001.2011.03915

微信服务号

微信订阅号

2025-5-16- 4

Home > Archive>Volume 22, Issue 10, 2011 >2509-2522. DOI:10.3724/SP.J.1001.2011.03915

PDF HTML XML Export Cite reminder

Optimizing Method for Improving the Performance of MPI Broadcast under Unbalanced Process Arrival Patterns
DOI:
                        10.3724/SP.J.1001.2011.03915
                    
Author:
                        LIU Zhi-QiangLIU Zhi-Qiang
College of Computer, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SONG Jun-QiangSONG Jun-Qiang
College of Computer, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LU Feng-ShunLU Feng-Shun
College of Computer, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XU FenXU Fen
College of Computer, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [16]

Cited by

Materials

Comments

Abstract:

This paper aims at improving the performance of MPI broadcasts under unbalanced process arrival (UPA) patterns. This paper analyzes this problem with a performance model and proves that the negative impact of UPA on MPI broadcast can be effectively reduced by the competition of intra-node MPI processes on a multicore cluster. Based on this theory, a new optimizing method, called competitive and pipelined method (CP), is proposed. The CP method can start inter-node communications during the broadcast process through an intra-node competitive mechanism. In a CP method based broadcast algorithm, intra-node communications overlap inter-node communications through a pipelined method, and intra-node communications are implemented through shared memory while inter-node communications are executed by a set of leader MPI processes, which is selected by the competitive mechanism. In order to verify the CP method, this paper improves three typical broadcast algorithms by using this method and evaluates these algorithms in a real platform by using a micro-benchmark case and two practical applications. The results show that the performance of the CP method can effectively improve the performance of broadcast algorithms in the condition of UPA patterns. In the experimental results of the performance of the practical applications, the performance of CP broadcasts is about 16% higher than the performance of P broadcasts and is 18% to 24% higher than the performance of broadcast operation in MVAPICH2 1.2.

Key words:process arrival pattern; MPI; collective communication; MPI_Bcast; competitive and pipelined method

Reference

[1] Faraj A, Patarasuk P, Yuan X. A study of process arrival patterns for MPI collective operations. Int’l Journal of Parallel Programming, 2008,36(6):543-570. [doi: 10.1007/s10766-008-0070-9]

[2] The MPI Forum. The MPI-2: Extensions to the message passing interface. 1997. http://www.mpi-forum.org/docs/mpi-20-html/ mpi2-report.html

[3] Kesavan R, Bondalapati K, Panda DK. Multicast on irregular switch-based networks with wormhole routing. In: Proc. of the IEEE HPCA. San Antonio, 1997. 48-57. [doi: 10.1109/HPCA.1997.569602]

[4] Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. Int’l Journal of High Performance Computing Applications, 2005,19(1):49-66. [doi: 10.1177/1094342005051521]

[5] Patarasuk P, Faraj A, Yuan X. Pipelined broadcast on ethernet switched clusters. In: Proc. of the 20th IEEE IPDPS. Rhodes Island: IEEE, 2006. [doi: 10.1109/IPDPS.2006.1639364]

[6] Mamidala AR, Kumar R, De D, Panda DK. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics. In: Proc. of the CCGRID. Lyon, 2008. 130-137. [doi: 10.1109/CCGRID.2008.87]

[7] Watts J, Van De Gejin R. A pipelined broadcast for multidimensional meshes. Parallel Processing Letters, 1995,5(2):281-292. [doi: 10.1142/S0129626495000266]

[8] Open MPI. Open source high performance computing. http://www.open-mpi.org/

[9] Ritzdorf H, Traff JL. Collective operations in NEC’s high-performance MPI libraries. In: Proc. of the 20th Int’l Parallel and Distributed Processing Symp. (IPDPS). Rhodes Island: IEEE, 2006. [doi: 10.1109/IPDPS.2006.1639334]

[10] Patarasuk P, Yuan X. Efficient MPI bcast across different process arrival patterns. In: Proc. of the 22nd Int’l Parallel and Distributed Processing Symp. (IPDPS). Miami: IEEE, 2008. [doi: 10.1109/IPDPS.2008.4536308]

[11] Qian Y, Afsahi A. Process arrival pattern and shared memory aware alltoall on InfiniBand. In: Proc. of the EuroPVM/MPI 2009. LNCS 5759, Espoo, 2009. 250-260. [doi: 10.1007/978-3-642-03770-2_31]

[12] Faraj A, Kumar S, Smith B, Mamidala A, Gunnels J, Heidelberger P. MPI collective communications on the blue gene/P supercomputer. In: Proc. of the ICS. Yorktown Heights, 2009. 489-490. [doi: 10.1145/1542275.1542344]

[13] MPICH2. http://www.mcs.anl.gov/mpi/mpich2

[14] MVAPICH. http://mvapich.cse.ohio-state.edu

[15] Kielmann T, Hofman RFH, Bal HE, Plaat A, Bhoedjang RAF. MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Proc. of the PpoPP. Atlanta, 1999. 131-140.

[16] Chen GL. Parallel Algorithm Practice. Beijing: Higher Education Press, 2004. 396-398 (in Chinese).

Get Citation

刘志强,宋君强,卢风顺,徐芬.非平衡进程到达模式下MPI广播的性能优化方法.软件学报,2011,22(10):2509-2522

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 11,2009
Revised:March 05,2010
Adopted:
Online:
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History