Reduction Algorithm Optimization Based on the OpenCL
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [10]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Reduction algorithm has a wide range of applications in areas such as scientific computing and image processing. This paper systematically studies the reduction algorithm optimization on the GPU’s cross-platform performance optimization based on the OpenCL framework. Previous research has generally focused on a single hardware architecture, however, this paper based on the OpenCL, studies various kinds of optimization methods, such as using vector, on-chip memory bank conflict, threads organization, instruction selection and so on. The research takes the minMax function for example, dilatationed each optimization method for develep the performance, and detailed the reason. The study tests the algorithm both on AMD GPU and NVIDIA GPU platforms. The test results show that the optimized algorithm on both platforms has achieved good performance. In the AMD ATI Radeon HD 5850 platform, Int and Float types of data bandwidth utilization up to 89%. In the NVIDIA GPU Tesla C2050 platform, the performance has reached 1.3 to 1.9 times compare to appropriate function version of CUDA.

    Reference
    [1] Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC. GPU computing. In: Proc. of the IEEE 96. 2008. 879?899.
    [2] Harris M. Optimizing parallel reduction in CUDA. Nvidia Developer Technology. 2007.
    [3] Stone JE, Gohara D, Shi G. OpenCL: A parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng., 2010,12:66?73.
    [4] 陈钢,吴百锋.面向OpenCL 模型的GPU 性能优化.计算机辅助设计与图形学学报,2011,23(4).
    [5] Karimi K, Dickson NG, Hamze F. A performance comparison of CUDA and OpenCL. 2010. http://arxiv.org/abs/1005.2581
    [6] AMD 上海研发中心.跨平台的多核与众核编程讲义——OpenCL 的方式.上海,2010.
    [7] NVIDIA. NVIDIA OpenCL Best Practice Guide Version 1.0. 2010.
    [8] Khronos OpenCL Working Group. Aaftab Munshi. The OpenCL Specification v1.1. 2011.
    [9] Advanced Micro Devices. AMD Accelerated Parallel Processing Programming Guide OpenCL. 2011.
    [10] Kho R, et al. A 75 nm 7 Gb/s/pin 1 Gb GDDR5 graphics memory device with bandwidth-improvement techniques. In: Proc. of the IEEE ISSCC Dig.Tech. 2009. 134?135.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

颜深根,张云泉,龙国平,李焱.基于OpenCL 的归约算法优化.软件学报,2011,22(zk2):163-171

Copy
Share
Article Metrics
  • Abstract:3660
  • PDF: 7539
  • HTML: 0
  • Cited by: 0
History
  • Received:July 15,2011
  • Revised:December 02,2011
  • Online: March 30,2012
You are the first2034842Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063