MTTorch: 面向MT-3000芯片和Transformer模型的PyTorch算子库实现与优化

doi:10.13328/j.cnki.jos.007244

微信服务号

微信订阅号

2025年5月1日 14:30 星期四

首页 > 过刊浏览>年第卷第期 >1-21. DOI:10.13328/j.cnki.jos.007244

PDF HTML阅读 XML下载导出引用引用提醒

MTTorch: 面向MT-3000芯片和Transformer模型的PyTorch算子库实现与优化
DOI:
                        10.13328/j.cnki.jos.007244
                    
CSTR:
                        
                    
作者:
                        王昊天王昊天
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找
孙羽菲孙羽菲
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找
隋轶丞隋轶丞
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找
王嘉豪王嘉豪
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找
石昌青石昌青
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找
方建滨方建滨
国防科技大学 计算机学院, 湖南 长沙 410073
在期刊界中查找
在百度中查找
在本站中查找
张玉志张玉志
南开大学 软件学院, 天津 300450
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP303
基金项目:国家重点研发计划(2021YFB0300104); 先进计算与关键软件海河实验室科技项目(22HHXCJC00001); 启元实验室创新基金(2022-JCJO-LA-001-068)

MTTorch: PyTorch Arithmetic Library Implementation and Optimization for MT-3000 Chip and Transformer Model

Author:

WANG Hao-Tian
WANG Hao-Tian
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Yu-Fei
SUN Yu-Fei
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找
SUI Yi-Cheng
SUI Yi-Cheng
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Jia-Hao
WANG Jia-Hao
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找
SHI Chang-Qing
SHI Chang-Qing
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找
FANG Jian-Bin
FANG Jian-Bin
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yu-Zhi
ZHANG Yu-Zhi
College of Software, Nankai University, Tianjin 300450, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着Transformer类大模型的飞速发展, 算力逐渐成为制约领域发展的瓶颈, 如何根据加速器硬件的结构特性加速和优化大语言模型的训练性能已成为研究热点. 面向天河新一代超算系统的加速芯片MT-3000, 提出并实现了适用于CPU+DSP异构架构的PyTorch扩展库——MTTorch, 其核心是一个多核并行的算子库, 对Transformer类模型训练过程中的核心算子进行向量化实现和优化. 同时, 针对MT-3000架构特性, 提出了面向多核 DSP 的高性能规约算法及乒乓算法, 显著提升了算子的运算性能. MTTorch还具有很好的通用性, 对于不同版本的 PyTorch都可以动态链接库的形式进行加载, 不改变PyTorch的原生实现. 大量实验证明, 实现的核心算子在 MT-3000 芯片上有着很好的性能, 在单DSP 簇上可以达到 8 倍的加速效果. 利用MTTorch在多节点执行训练任务时有着接近线性的加速比, 极大地提升了Transformer类模型在MT-3000 芯片上的训练效率.

关键词:PyTorch;高性能计算;Transformer模型;天河超级计算机;CPU+DSP异构计算;软件生态

Abstract:

With the rapid development of Transformer-based large models, computing power has gradually become a bottleneck in the development of this field. Research hotspots rely on how to accelerate and optimize the training performance of large language models based on the structural characteristics of accelerator hardware. This study proposes and implements MTTorch, a PyTorch extension library for the CPU+DSP heterogeneous architecture, which is applicable to the MT-3000 accelerator chip of the new generation of the Tianhe supercomputer. The core of MTTorch is a multi-core parallel operator library that vectorizes and optimizes the core operators during the training of Transformer-based models. Additionally, this study innovatively proposes a high-performance reduction algorithm and a ping-pong algorithm for multi-core DSP, significantly improving the computational performance of the operators. MTTorch also has good generality as it can be loaded as a dynamic link library for different versions of PyTorch without changing the native implementation of PyTorch. Extensive experiments show that the core operators implemented in this study have excellent performance on MT-3000 chip, achieving 8 times acceleration on a single DSP cluster. Using MTTorch for training tasks on multiple nodes achieves nearly linear acceleration, greatly improving the training efficiency of Transformer-based models on MT-3000 chip.

Key words:PyTorch;high performance computing;Transformer model;Tianhe supercomputer;CPU+DSP heterogeneous computing;software ecology

引用本文

王昊天,孙羽菲,隋轶丞,王嘉豪,石昌青,方建滨,张玉志. MTTorch: 面向MT-3000芯片和Transformer模型的PyTorch算子库实现与优化.软件学报,,():1-21

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-11-13
最后修改日期:2024-03-13
录用日期:
在线发布日期: 2024-12-31
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码