Abstract:Heterogeneous architectures are dominating the realm of high-performance computing. However, these architectures also complicate the programming issue due to its increasingly complex hardware and memory hierarchy compared to homogeneous architectures. One of the most promising solutions to this issue is making use of optimizing compilers which can help programmers develop high-performance code executable on target machines, thereby simplifying the difficulty of programming. The polyhedral model is widely studied due to its ability to generate effective code and portability to various targets, which is realized by first converting a program into its intermediate representation and then combining the compositions of loop transformations and hardware binding strategies. This paper presents a source-to-source parallel code generator targeting the domestic, heterogeneous architecture of the Sunway machine using the polyhedral model. In particular, the computation is deployed automatedly onto the Sunway architecture and memory management, minimizing the amount of data movements between the management processing element and computing processing elements of the target. The experiments are conducted on 13 linear algebra applications extracted from the Polybench Benchmarks. The experimental results show that the proposed approach can generate effective code executable on the Sunway heterogeneous architecture, providing a mean speedup of 539.16× on 64 threads over the sequential implementation executed on a management processing element.