ParaC:面向GPU平台的图像处理领域的编程框架
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61432018,61402445,61502452,61602443,61432018);国家重点研发计划(2016YFB1000402);数学工程与先进计算国家重点实验室开放基金(2016A03);北京市科委计划(D161100001216002)


ParaC: A Domain Programming Framework of Image Processing on GPU Accelerators
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61432018, 61402445, 61502452, 61602443, 61432018); National Key R&D Program of China (2016YFB1000402); State Key Laboratory of Mathematical Engineering and Advanced Computing Open Foundation (2016A03); Beijing Municipal Science & Technology Commission Program (D161100001216002)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    GPGPU加速器是当前提高图像处理算法性能的主流加速平台,但在GPGPU平台上,同一个程序充分利用硬件体系结构特征和软件特征的优化版本与简单实现版本在性能上会有数量级的差异.GPGPU加速器具有多维多层的大量执行线程和层次化存储体系结构,后者的不同层次具有不同的容量、带宽、延迟和访问权限.同时,图像处理应用程序具有复杂的计算操作、边界处理规则和数据访问特性.因此,任务的并发执行模式、线程的组织方式和并发任务到设备的映射不仅影响到程序的并发度、调度、通信和同步等特性,而且也会影响到访存的带宽、延迟等.因此,GPGPU平台上的程序优化是一个困难、复杂且效率较低的过程.提出基于语言扩展的领域编程模型:ParaC.ParaC编程环境利用高层语言扩展描述的程序语义信息,自动分析获取应用程序的操作信息、并发任务间的数据重用信息和访存信息等程序特征,同时结合硬件平台特征,利用基于领域先验知识驱动的编译优化模型自动生成GPGPU平台上的优化代码,最后,利用源源变换编译器生成标准OpenCL程序.在测试用例上的实验结果表明,ParaC在GPGPU平台上自动生成的优化版本相对于手工优化版本的加速比最高达到3.22倍,但代码行数只是后者的1.2%~39.68%.

    Abstract:

    Image processing algorithms take the GPU accelerators as the main speedup solution. However, the performance difference between a naïve implementation and a highly optimized one on the same GPU accelerators is frequently an order of magnitude or more. The GPGPU platform features complicated hardware architecture characteristics, such as the large amount of multi-dimension and multi -level threads and the deep hierarchy memory system, while the different part of the latter features different capacity, bandwidth, latency and access authority. Additionally, image processing algorithms have complex operations, border data accessing rules and memory accessing patterns. Therefore, parallel execution model of tasks, organization of threads and parallel tasks to device mapping not only have big impact on the scalability, scheduling, communication and synchronization, but also affect the efficiency of memory accessing. In a word, the algorithm optimization methods on GPGPU platforms are difficult, complicated and less efficient. This paper proposes a domain specific language, ParaC, which can provide high level program semantics through the new language extensions. It obtains the applications' software characteristics, such as the operation information, the data reuse among parallel tasks and the memory access patterns, along with hardware platform information and the domain pre-knowledge driven optimization mechanism, to generate high performance GPGPU code automatically. The source-to-source compiler is then used to output the standard OpenCL programs. Experiment results on test cases show that ParaC automatically generated optimization version has gained 3.22 speedup compared to the hand-tuned version for the best case, while the number of lines of the former is just 1.2% to 39.68% of the latter.

    参考文献
    相似文献
    引证文献
引用本文

卢兴敬,刘雷,贾海鹏,冯晓兵,武成岗. ParaC:面向GPU平台的图像处理领域的编程框架.软件学报,2017,28(7):1655-1675

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-09-05
  • 最后修改日期:2016-10-14
  • 录用日期:
  • 在线发布日期: 2016-11-26
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号