Abstract:The on-chip memory hierarchy of Sunway many-core processors is an important structure to alleviate the many-core “memory access wall”. The SPM structure and on-chip RMA communication mechanism completely managed by software bring many opportunities for improving application performance but also pose great challenges for development optimization and porting of applications. To fully explore the hierarchical features of on-chip memory, improve application performance, and reduce the burden of user programming optimization, this study proposes a compiler optimization method that integrates multi-level memory access and communication. This method first designs a fusion compiler directive to transfer high-level information of the program to the compiler. Secondly, a compiler optimization revenue model is built and an iterative solution framework of a heuristic loop optimization scheme is designed. Meanwhile, the compiler completes the solution and code transformation of the loop optimization scheme. DMA and RMA batch data transmission operations are generated by compilation, batch buffer core data with high access latency from lower storage hierarchy spaces into higher storage hierarchy spaces with low access latency. Optimization experiments and analysis are conducted on three typical test cases, and the results show that the program performance optimized by this method is comparable to manual optimization, and significantly improves compared to the unoptimized version.