Loop-Nest Auto-Vectorization Based on SLP

doi:10.3724/SP.J.1001.2012.04106

微信服务号

微信订阅号

2025-5-13- 3

Home > Archive>Volume 23, Issue 7, 2012 >1717-1728. DOI:10.3724/SP.J.1001.2012.04106

PDF HTML XML Export Cite reminder

Loop-Nest Auto-Vectorization Based on SLP
DOI:
                        10.3724/SP.J.1001.2012.04106
                    
Author:
                        WEI ShuaiWEI Shuai
Information Engineering Colledge, PLA Information Engineering University, Zhengzhou 450002, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO Rong-CaiZHAO Rong-Cai
Information Engineering Colledge, PLA Information Engineering University, Zhengzhou 450002, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YAO YuanYAO Yuan
Information Engineering Colledge, PLA Information Engineering University, Zhengzhou 450002, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Nowadays, more and more processors are integrated with SIMD (single instruction multiple data) extensions, and most of the compilers have applied automatic vectorization, but the vectorization usually targets the innermost loop, there have been no easy vectorization approaches that deal with the loop nest. This paper brings out an automatic vectorization approach to vectorize nested loops form outer to inner. The paper first analyzes whether the loop can do direct unroll-and-jam through dependency analysis. Next, this study collects the values about the loop that will influence vectorization performance, including whether it can do direct unroll-and-jam, the number of array references that are continuous for this loop index and the loop region. Moreover, the study also presents an aggressive algorithm that will be used to decide which loops need to do unroll-and-jam at last generate SIMD code using SLP (superword level parallelism) algorithm. The test results on Intel platform show that the average speedup factor of some numerical/video/communication kernels achieved by this approach is 2.13/1.41, better than the innermost loop vectorization and simple outer-loop vectorization, the speedup factor of some common kernels can reach 5.3.

Key words:SIMD (single instruction multiple data);vectorization;data dependence analysis;nested loop;SLP (superword level parallelism)

Get Citation

魏帅,赵荣彩,姚远.面向SLP 的多重循环向量化.软件学报,2012,23(7):1717-1728

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 19,2011
Revised:July 21,2011
Adopted:
Online: July 03,2012
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History