面向国产异构众核系统的Parallel C语言设计与实现
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点基础研究发展计划(973)(2016YFB0200502);国家高技术研究发展计划(863)(2012AA010903,2015AA01A301);计算机体系结构国家重点实验室基金(CARCH201403)


Design and Implementation of Parallel C Programming Language for Domestic Heterogeneous Many-Core Systems
Author:
Affiliation:

Fund Project:

National Basic Research Program of China (973) (2016YFB0200502); National High Technology Research and Development Program of China (863)(2012AA010903, 2015AA01A301); Fund Projects of State Key Laboratory of Computer Architecture (CARCH201403)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用.

    Abstract:

    Heterogeneous many-core architecture, with ultra-high performance to power consumption ratio, has become an important trend of supercomputer architecture development. However, many-core systems always have more complex parallel hierarchy and memory hierarchy, hence posing a great challenge to programming and optimization. Therefore, the study of many-core-oriented parallel programming techniques is of great significance, since it can reduce the difficulty of parallel programming on domestic many-core systems and improve the performance of parallel programs. This work proposes a multi-model parallel programming model upon unified architecture, including heterogeneous-fused speedup programming model and isomorphic independent programming model. Based on this model, Parallel C programming language is designed to effectively describe heterogeneous parallelism of the domestic many-core system. Compared to MPI+X programming pattern, programming with Parallel C has a global perspective, as well as advantages in the hierarchy locality description, one-side message passing and multi-core applications compatibility. The Parallel C compiler system constructed with Open64 fully supports the heterogeneous-fused speedup programming model and isomorphic independent programming model. In addition, the design and implementation of data layout and automatic DMA optimization, compiler-directed thread proxy optimization and topology-aware collective communications optimization are presented. The performance of the proposed method is evaluated with the Miro Benchmark and practical applications on Sunway Taihu Light computer system. Experimental results show that Parallel C language and the compile system have good performance and scalability to effectively support large-scale applications.

    参考文献
    相似文献
    引证文献
引用本文

何王全,刘勇,方燕飞,魏迪,漆锋滨.面向国产异构众核系统的Parallel C语言设计与实现.软件学报,2017,28(4):764-785

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-06-20
  • 最后修改日期:2016-09-08
  • 录用日期:
  • 在线发布日期: 2017-01-24
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号