基于LLVM Pass的复杂嵌套循环自动并行化框架

doi:10.13328/j.cnki.jos.006858

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2023年第34卷第7期 >3022-3042. DOI:10.13328/j.cnki.jos.006858

PDF HTML阅读 XML下载导出引用引用提醒

基于LLVM Pass的复杂嵌套循环自动并行化框架
DOI:
                        10.13328/j.cnki.jos.006858
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:马春燕(1978-),女,博士,副教授,博士生导师,主要研究领域为嵌入式软件系统建模与验证,软件自动化测试与故障定位;叶许姣(1998-),女,硕士生,主要研究领域为自动并行化,程序分析;吕炳旭(1998-),男,博士生,主要研究领域为并行编译,程序优化;张雨(1983-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为智能嵌入式系统协同设计与验证,智能信息处理.
通讯作者:张雨,E-mail:yuzhang2015@hainanu.edu.cn
中图分类号:
基金项目:国家自然科学基金（62192733，62062030）；航空基金（20185853038，201907053004）

Automatic Parallelization Framework for Complex Nested Loops Based on LLVM Pass

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着多核处理器的普及应用，针对嵌入式遗留系统中串行代码的自动并行化方法是研究热点.其中，针对具有非完美嵌套结构、非仿射依赖关系特征的复杂嵌套循环的自动并行化方法存在技术挑战.提出了一种基于LLVM Pass的复杂嵌套循环的自动并行化框架（CNLPF）.首先，提出了一种复杂嵌套循环的表示模型，即循环结构树，并将嵌套循环的正则区域自动转换为循环结构树表示；然后，对循环结构树进行数据依赖分析，构建循环内和循环间的依赖关系；最后，基于OpenMP共享内存的编程模型生成并行的循环程序.针对SPEC2006数据集中包含近500个复杂嵌套循环的6个程序案例，分别对其进行复杂嵌套循环占比统计和并行性能加速测试.结果表明，提出的自动并行化框架可以处理LLVMPolly无法优化的复杂嵌套循环，增强了LLVM的并行编译优化能力，且该方法结合Polly的组合优化，比单独采用Polly优化的加速效果提升了9%-43%.

Abstract:

With the popularization of multi-core processors, automatic parallelization of serial codes in embedded legacy systems is a research hotspot, while there are technical challenges in the automatic parallelization method for complex nested loops with imperfect nested structure and non-affine dependency characteristics. This study proposes an automatic parallelization framework (CNLPF) for complex nested loops based on LLVM Pass. Firstly, a representation model of complex nested loops, namely loop structure tree, is proposed, and the regular region of nested loops is automatically converted into a loop structure tree representation. Then, the data dependency analysis is carried out on the loop structure tree to construct intra-loop and inter-loop dependency relationship. Finally, the parallel loop program is generated based on the OpenMP shared memory programming model. For the 6 program cases in the SPEC2006 data set containing nearly 500 complex nested loops, the statistics of the proportion of complex nested loops and the parallel performance acceleration test were carried out respectively. The results show that the automatic parallelization framework proposed in this study can deal with complex nested loops that cannot be optimized by LLVM Polly, which enhances the parallel compilation and optimization capabilities of LLVM, and the method combined with Polly optimization improves the acceleration effect of Polly optimization alone by 9%-43%.

参考文献

相似文献

引证文献

引用本文

马春燕,吕炳旭,叶许姣,张雨.基于LLVM Pass的复杂嵌套循环自动并行化框架.软件学报,2023,34(7):3022-3042

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-09-04
最后修改日期:2022-10-08
录用日期:
在线发布日期: 2022-12-30
出版日期: 2023-07-06

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码