基于神威蓝光处理器的向量数学软件包
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61133005,61272136);国家高技术研究发展计划(863)(2012AA010902,2012AA010903);中国科学院研究生科技创新与社会实践资助


Package of the Vector Math Library Based on the Sunway Processor
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    首先介绍了SIMD扩展技术,并分析了使用SIMD扩展的3种方式,认为通过调用特定目标平台优化的第三方库是应用领域软件开发者快速开发高效并行程序的较好的方式;其次,介绍了国产神威处理器SW-1600平台,并利用SIMD扩展和循环展开等技术开发了SW-VML(SW Vector Math Library),开发过程中提出了访存对界、简化向量条件分支的优化方法,解决了非对界访存、向量与标量数组转换影响性能的问题,并根据SW编译器对OpenMP的支持,开发了多线程OpenMp版;最后,在SW-1600平台上采用不同向量规模对SW-VML进行了测试,测试结果显示,SIMD向量化相对于串行程序加速比为2.08,4线程相对单线程平均加速比为2.26.SW-VML是在国产神威系列处理器上开发高效程序的向量函数软件包,也是在神威蓝光高性能计算平台单计算节点开发高性能程序的基础软件工具包.

    Abstract:

    This paper first introduces the SIMD (single instruction multiple data) extension technology and presents three ways to use SIMD instructions. It is considered that calling the third party library, which is optimized for the target platform by using those instructions, is the best way to benefit the developers. Next, it introduces the China-developed SW-1600 CPU, and a software package called SW-VML, which consists of many mathematical functions, by using the SIMD extension technology. In order to solve the additional overhead caused by unaligned address access and transformation between vector and scalar array, the paper provides some performance optimized methods, such as aligned address access, simplifying vector condition branch and loop unrolling. An upgrade to SW-VML is also offered to support multi-thread with OpenMP. Finally, functions in the package are tested using arrays of different sizes on SW-1600,and the test results show that high performance is achieved with the technology of the SIMD vectorization. Compared with the traditional methods of the scalar calculation, the average speedup is up to 2.06. The performance speedup of package using 4 threads is up to 2.26 compared to using a single thread. SW-VML is a common vector function package for domestic Sunway processor series, and it can be used as a basic toolkit which is beneficial to high performance computing on Sunway platform.

    参考文献
    相似文献
    引证文献
引用本文

解庆春,张云泉,李焱,逄仁波,吴再龙,鲁永泉,高鹏东.基于神威蓝光处理器的向量数学软件包.软件学报,2014,25(S2):70-79

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2013-08-05
  • 最后修改日期:2014-03-13
  • 录用日期:
  • 在线发布日期: 2015-01-29
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号