SW26010众核任务并行调度系统及其嵌套并行算法应用
CSTR:
作者:
作者单位:

作者简介:

孙乔(1989-),男,博士,高级工程师,主要研究领域为并行编程模型,并行算法.
赵慧(1984-),女,博士,助理研究员,主要研究领域为高性能计算.
黎雷生(1981-),男,博士,副研究员,主要研究领域为并行计算.
吴长茂(1974-),男,博士,副研究员,CCF专业会员,主要研究领域为并行算法与并行软件,大规模渲染算法,异构平台数值计算.
赵海涛(1981-),男,博士,副研究员,CCF专业会员,主要研究领域为高性能工程,科学计算.

通讯作者:

吴长茂,E-mail:changmaowu@foxmail.com

中图分类号:

TP303

基金项目:

中国科学院战略性先导科技专项(C类)(XDC01030200)


Task Parallel Framework and Its Application in Nested Parallel Algorithms on the SW26010 Many-core Platform
Author:
Affiliation:

Fund Project:

Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    任务并行是并行程序设计的基础设计模式.但由于算法本身的复杂性及目标平台的特殊性,设计实现高效率的任务并行程序对程序员来说往往充满挑战.基于新兴的SW26010众核CPU,提出了支持任务嵌套并行模式的通用运行时框架SWAN.SWAN对任务并行程序的实现提供了高层次的抽象,使程序员能够专注于算法逻辑本身而提高开发效率.在性能方面,SWAN框架对诸多共享资源进行了细粒度的划分,从而有效地避免了众多线程间对共享资源的高强度争用.充分利用平台的高速访存机制、高速可控缓存和原子操作等特性,对SWAN框架的核心数据结构进行优化设计以降低其本身的性能开销.SWAN还具备动态负载均衡能力,使各个处理器核心的资源得以充分利用.基于SWAN框架,在目标平台上实现了若干典型的具有递归特性的嵌套并行算法,包括N-皇后问题、二叉树遍历、快速排序和凸包求解.实验结果表明,这些通过使用SWAN框架得以并行化的算法相对于其串行版本取得了4.5~32倍的加速,充分说明了SWAN框架具有较高的实用性及性能.

    Abstract:

    Task parallelism is one of the fundamental patterns for designing parallel algorithms. Due to algorithm complexity and distinctive hardware features, however, implementation of algorithms in task parallelism often remains to be challenging. On the newly SW26010 many-core CPU platform, a general runtime framework, SWAN, which supports nested task parallelism is proposed in this study. SWAN provides high-level abstractions for programmers to implement task parallelism so that they can focus mainly on the algorithm itself, enjoying an enhanced productivity. In the aspect of performance, the shared resources and information manipulated by SWAN are partitioned in a fine-grained manner to avoid fierce contention among working threads. The core data structures within SWAN take advantage of the high-bandwidth memory access mechanism, fast on-chip scratchpad cache as well as atomic operations of the platform to reduce the overhead of SWAN itself. Besides, SWAN provides dynamic load-balancing strategies in runtime to ensure a full occupation of the threads. In the experiment, a set of recursive algorithms in nested parallelism, including the N-queens problem, binary-tree traversal, quick sort, and convex hull, are implemented using SWAN on the target platform. The experimental results reveal that each of the algorithms can gain a significant speedup, from 4.5x to 32x, against its serial counterpart, which suggests that SWAN has a high usability and performance.

    参考文献
    相似文献
    引证文献
引用本文

孙乔,黎雷生,赵海涛,赵慧,吴长茂. SW26010众核任务并行调度系统及其嵌套并行算法应用.软件学报,2021,32(8):2352-2364

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-08-22
  • 最后修改日期:2019-12-05
  • 录用日期:
  • 在线发布日期: 2021-08-05
  • 出版日期: 2021-08-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号