Abstract:Software pipelining is a loop scheduling technique that extracts instruction level parallelism by overlapping the execution of several consecutive iterations. One of its drawbacks is the high register requirements, which may lead to software pipelining failure due to insufficient static general registers in Itanium. This paper evaluates the register requirements of software-pipelined loops and presents three new methods for software pipelining loops that require more static general registers than those available in Itanium processor. They reduce register pressure by either reducing instructions in the loop body or allocating stacked non-rotating registers or rotating register in register stack to serve as static registers. These methods are better than the existing techniques in that they further improve performance gain from software pipelining by increasing software-pipelined loops. These methods have been implemented in open research compiler (ORC) targeted for Itanium processor, and they perform well on loops of the programs in NAS Benchmarks. For some benchmarks, the performance is improved by more than 21%.