面向Stencil计算的自动混合精度优化

doi:10.13328/j.cnki.jos.006757

微信服务号

微信订阅号

首页 > 过刊浏览>2023年第34卷第12期 >5704-5723. DOI:10.13328/j.cnki.jos.006757

PDF HTML阅读 XML下载导出引用引用提醒

面向Stencil计算的自动混合精度优化
DOI:
                        10.13328/j.cnki.jos.006757
                    
作者:
                        
                        
                    
作者单位:
作者简介:宋广辉(1997－),男,硕士生,主要研究领域为高性能计算,先进编译技术.;郭绍忠(1964－),女,教授,CCF高级会员,主要研究领域为高性能计算,分布式处理.;赵捷(1987－),男,讲师,CCF专业会员,主要研究领域为先进编译技术.;陶小涵(1996－),男,博士生,主要研究领域为先进编译技术.;李飞(1996－),男,硕士生,主要研究领域为高性能计算.;许瑾晨(1987－),男,讲师,主要研究领域为高性能计算.
通讯作者:许瑾晨,E-mail:atao728208@126.com
中图分类号:TP18
基金项目:国家自然科学基金(U20A20226)

Automatic Mixed Precision Optimization for Stencil Computation

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

混合精度在深度学习和精度调整与优化方面取得了许多进展, 广泛研究表明, 面向Stencil计算的混合精度优化也是一个很有挑战性的方向. 同时, 多面体模型在自动并行化领域取得的一系列研究成果表明, 该模型为循环嵌套提供很好的数学抽象, 可以在其基础上进行一系列的循环变换. 基于多面体编译技术设计并实现了一个面向Stencil计算的自动混合精度优化器, 通过在中间表示层进行迭代空间划分、数据流分析和调度树转换, 首次实现了源到源的面向Stencil计算的混合精度优化代码自动生成. 实验表明, 经过自动混合精度优化之后的代码, 在减少精度冗余的基础上能够充分发挥其并行潜力, 提升程序性能. 以高精度计算为基准, 在x86平台上最大加速比是1.76, 几何平均加速比是1.15; 在新一代国产申威平台上最大加速比是1.64, 几何平均加速比是1.20.

Abstract:

Mixed precision has made many advances in deep learning and precision tuning and optimization. Extensive research shows that mixed precision optimization for stencil computation is challenging. Moreover, the research achievements secured by the polyhedral model in the field of automatic parallelization indicate that the model provides a good mathematical abstraction for loop nesting, on the basis of which loop transformations can be performed. This study designs and implements an automatic mixed precision optimizer for Stencil computation on the basis of polyhedral compilation technology. By performing iterative domain partitioning, data flow analysis, and scheduling tree transformation on the intermediate representation layers, this study implements the source-to-source automatic generation of mixed precision codes for Stencil computation for the first time. The experiments demonstrate that the code after automatic mixed precision optimization can give full play to its parallelism potential and improve the performance of the program by reducing precision redundancy. With high-precision computing as the benchmark, the maximum speedup is 1.76, and the geometric average speedup is 1.15 on the x86 architecture; on the new-generation Sunway architecture, the maximum speedup is 1.64, and the geometric average speedup is 1.20.

参考文献

相似文献

引证文献

引用本文

宋广辉,郭绍忠,赵捷,陶小涵,李飞,许瑾晨.面向Stencil计算的自动混合精度优化.软件学报,2023,34(12):5704-5723

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-04-01
最后修改日期:2022-06-11
录用日期:
在线发布日期: 2023-02-22
出版日期: 2023-12-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码