Optimizing Method for Improving the Performance of MPI Broadcast under Unbalanced Process Arrival Patterns
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [16]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    This paper aims at improving the performance of MPI broadcasts under unbalanced process arrival (UPA) patterns. This paper analyzes this problem with a performance model and proves that the negative impact of UPA on MPI broadcast can be effectively reduced by the competition of intra-node MPI processes on a multicore cluster. Based on this theory, a new optimizing method, called competitive and pipelined method (CP), is proposed. The CP method can start inter-node communications during the broadcast process through an intra-node competitive mechanism. In a CP method based broadcast algorithm, intra-node communications overlap inter-node communications through a pipelined method, and intra-node communications are implemented through shared memory while inter-node communications are executed by a set of leader MPI processes, which is selected by the competitive mechanism. In order to verify the CP method, this paper improves three typical broadcast algorithms by using this method and evaluates these algorithms in a real platform by using a micro-benchmark case and two practical applications. The results show that the performance of the CP method can effectively improve the performance of broadcast algorithms in the condition of UPA patterns. In the experimental results of the performance of the practical applications, the performance of CP broadcasts is about 16% higher than the performance of P broadcasts and is 18% to 24% higher than the performance of broadcast operation in MVAPICH2 1.2.

    Reference
    [1] Faraj A, Patarasuk P, Yuan X. A study of process arrival patterns for MPI collective operations. Int’l Journal of Parallel Programming, 2008,36(6):543-570. [doi: 10.1007/s10766-008-0070-9]
    [2] The MPI Forum. The MPI-2: Extensions to the message passing interface. 1997. http://www.mpi-forum.org/docs/mpi-20-html/ mpi2-report.html
    [3] Kesavan R, Bondalapati K, Panda DK. Multicast on irregular switch-based networks with wormhole routing. In: Proc. of the IEEE HPCA. San Antonio, 1997. 48-57. [doi: 10.1109/HPCA.1997.569602]
    [4] Thakur R, Rabenseifner R, Gropp W. Optimization of collective communication operations in MPICH. Int’l Journal of High Performance Computing Applications, 2005,19(1):49-66. [doi: 10.1177/1094342005051521]
    [5] Patarasuk P, Faraj A, Yuan X. Pipelined broadcast on ethernet switched clusters. In: Proc. of the 20th IEEE IPDPS. Rhodes Island: IEEE, 2006. [doi: 10.1109/IPDPS.2006.1639364]
    [6] Mamidala AR, Kumar R, De D, Panda DK. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics. In: Proc. of the CCGRID. Lyon, 2008. 130-137. [doi: 10.1109/CCGRID.2008.87]
    [7] Watts J, Van De Gejin R. A pipelined broadcast for multidimensional meshes. Parallel Processing Letters, 1995,5(2):281-292. [doi: 10.1142/S0129626495000266]
    [8] Open MPI. Open source high performance computing. http://www.open-mpi.org/
    [9] Ritzdorf H, Traff JL. Collective operations in NEC’s high-performance MPI libraries. In: Proc. of the 20th Int’l Parallel and Distributed Processing Symp. (IPDPS). Rhodes Island: IEEE, 2006. [doi: 10.1109/IPDPS.2006.1639334]
    [10] Patarasuk P, Yuan X. Efficient MPI bcast across different process arrival patterns. In: Proc. of the 22nd Int’l Parallel and Distributed Processing Symp. (IPDPS). Miami: IEEE, 2008. [doi: 10.1109/IPDPS.2008.4536308]
    [11] Qian Y, Afsahi A. Process arrival pattern and shared memory aware alltoall on InfiniBand. In: Proc. of the EuroPVM/MPI 2009. LNCS 5759, Espoo, 2009. 250-260. [doi: 10.1007/978-3-642-03770-2_31]
    [12] Faraj A, Kumar S, Smith B, Mamidala A, Gunnels J, Heidelberger P. MPI collective communications on the blue gene/P supercomputer. In: Proc. of the ICS. Yorktown Heights, 2009. 489-490. [doi: 10.1145/1542275.1542344]
    [13] MPICH2. http://www.mcs.anl.gov/mpi/mpich2
    [14] MVAPICH. http://mvapich.cse.ohio-state.edu
    [15] Kielmann T, Hofman RFH, Bal HE, Plaat A, Bhoedjang RAF. MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Proc. of the PpoPP. Atlanta, 1999. 131-140.
    [16] Chen GL. Parallel Algorithm Practice. Beijing: Higher Education Press, 2004. 396-398 (in Chinese).
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘志强,宋君强,卢风顺,徐芬.非平衡进程到达模式下MPI广播的性能优化方法.软件学报,2011,22(10):2509-2522

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 11,2009
  • Revised:March 05,2010
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063