Detecting and Treatment Algorithm of Implicit Synchronization Based on Dependence Analysis in SPMD Program
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [16]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    SPMD translation compiles programs of one SPMD-threaded programming model to multi devices. The current researches base on the supposition that different threads are independent except in communication with explicit synchronizations. However, the data dependence relation between threads such as implicit synchronizations results in the correctness pitfalls in SPMD translation. In order to deal with implicit synchronizations, the implicit synchronizations in fine-grained SPMD programming model CUDA are analyzed systematically. The correctness pitfalls in existing SPMD translation from CUDA to Multi-core are revealed in which this paper proposes a method of detecting implicit synchronizations based on dependence analysis. On the basis of implicit synchronizations detecting, an optimized treatment algorithm is designed to treat explicit and implicit synchronizations synthetically by the loop reorder. The experimental results show that compared with existing SPMD translation, the detecting and optimized algorithm could treat kinds of implicit synchronizations in fine grained SPMD translation correctly and quickly by small expense, which helps compiler produces correct and efficient result.

    Reference
    [1] Auerbach J, Bacon DF, Cheng P, Rabbah R. Lime: A Javacompatible and synthesizable language for heterogeneous architectures.In: Proc. of the ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications. ACM Press, 2010.89-108. [doi: 10.1145/1932682.1869469]
    [2] OpenCL. http://www.khronos.org/opencl/
    [3] Stratton JA, Stone SS, Hwu WMW. MCUDA: An effective implementation of CUDA kernels for multi-core CPUs. In: Proc. of the21st Int’l Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 2008. 16-30. [doi: 10.1007/978-3-540-89740-8_2]
    [4] Diamos G, Kerr A, Kesavan M. Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneoussystem. In: Proc. of the 19th Int’l Conf. on Parallel Architectures and Compilation Techniques. ACM Press, 2010. 353-364. [doi:10.1145/1854273.1854318]
    [5] NVIDIA Corporation. NVIDIA CUDA Programming Guide. Version 2.3, NVIDIA Corporation, 2009. 71-75.
    [6] Wang PY, Chen YJ, Shen HH, Chen TS, Zhang H. Memory consistency verification of chip multi-processor. Ruan Jian Xue Bao/Journal of Software, 2010,21(4):863-874 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3705.htm [doi: 10.3724/SP.J.1001.2010.03705]
    [7] Stratton J, Grover V, Marathe J, Aarts B, Murphy M, Hu Z, Hwu WMW. Efficient compilation of fine-grained spmd-threadedprograms for multicore CPUs. In: Proc. of the 2010 Int’l Symp. on Code Generation and Optimization. ACM Press, 2010. 111-119.[doi: 10.1145/1772954.1772971]
    [8] Guo ZY, Zhang EZ, Shen XP. Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for CPU. In:Proc. of the 20th Int’l Conf. on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 2011. 310-319. [doi:10.1109/PACT.2011.62]
    [9] Guo ZY, Shen XP. Fine-Grained treatment to synchronizations in GPU-to-CPU translation. Technical Report, WM-CS-2011-02,2011.
    [10] Wu B, Zhang EZ, Shen XP. Enhancing data locality for dynamic simulations through asynchronous data transformations andadaptive control. In: Proc. of the 20th Int’l Conf. on Parallel Architecture and Compilation Techniques. IEEE Computer Society,2011. 243-252. [doi: 10.1109/PACT.2011.56]
    [11] Zhang EZ, Jiang YL, Guo ZY, Tian K, Shen XP. On-the-Fly elimination of dynamic irregularities for GPU computing. In: Proc. ofthe Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. ACM Press, 2011. 369-380. [doi: 10.1145/1950365.1950408]
    [12] Luk CK, Hong S, Kim H. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proc. of theInt’l Symp. on Microarchitecture. ACM Press, 2009. 45-55. [doi: 10.1145/1669112.1669121]
    [13] Kirk DB, Hwu WMW. Programming Massively Parallel Processors. Elsevier Inc., 2010. 39-42.
    [14] Aiken A, Gay D. Barrier inference. In: Proc. of the 25th ACM Symp. on Principles of Programming Languages. IEEE Press, 1998.342-354. [doi: 10.1145/268946.268974]
    [15] NVIDIA Corporation. Getting Started With CUDA SDK Samples. NVIDIA Corporation, 2012. 2-5.
    [16] Allen R, Kennedy K. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Elsevier Science, 2001.34-39.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

岳峰,庞建民,赵荣彩.基于依赖分析的SPMD程序隐式同步检测及处理算法.软件学报,2013,24(8):1775-1785

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 03,2011
  • Revised:October 19,2012
  • Online: July 26,2013
You are the first2033331Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063