Abstract:Heterogeneous many-core architecture, with ultra-high performance to power consumption ratio, has become an important trend of supercomputer architecture development. However, many-core systems always have more complex parallel hierarchy and memory hierarchy, hence posing a great challenge to programming and optimization. Therefore, the study of many-core-oriented parallel programming techniques is of great significance, since it can reduce the difficulty of parallel programming on domestic many-core systems and improve the performance of parallel programs. This work proposes a multi-model parallel programming model upon unified architecture, including heterogeneous-fused speedup programming model and isomorphic independent programming model. Based on this model, Parallel C programming language is designed to effectively describe heterogeneous parallelism of the domestic many-core system. Compared to MPI+X programming pattern, programming with Parallel C has a global perspective, as well as advantages in the hierarchy locality description, one-side message passing and multi-core applications compatibility. The Parallel C compiler system constructed with Open64 fully supports the heterogeneous-fused speedup programming model and isomorphic independent programming model. In addition, the design and implementation of data layout and automatic DMA optimization, compiler-directed thread proxy optimization and topology-aware collective communications optimization are presented. The performance of the proposed method is evaluated with the Miro Benchmark and practical applications on Sunway Taihu Light computer system. Experimental results show that Parallel C language and the compile system have good performance and scalability to effectively support large-scale applications.