Abstract:Due to its lower power consumption and cost, heterogeneous multi-core makes up a major computing resource in the current supercomputers. However, heterogeneous multi-core processor features high bandwidth and loose memory consistency, programmers pay attention to hardware details to get ideal memory and computation performance. This paper introduces CellMLP, a multi-level parallelism model for Cell BE heterogeneous multi-core processor. Through extending compiler directives based on C, CellMLP supports data parallelism, task parallelism and pipeline parallelism programming model, and improves the programming productivity. In addition, runtime optimizations are used to improve the performance. Parallel SPEs data transfer and double-buffer mechanisms are used to improve memory bandwidth. A novel hybrid task queue is used in task parallelism to support asynchronous work stealing, reduce the contention between SPE threads and increase the scalability of task parallelism. For the pipeline parallelism, low-overhead synchronization operations are firstly implemented utilizing signal channels in Cell BE. Experiments are conducted on Stream, NAS Benchmark, BOTS and other typical irregular applications. Results show that CellMLP can support different typical parallel applications efficiently. Compared with similar programming model SARC and CellSs, CellMLP has obvious advantages in terms of practical data transfer bandwidth as well as the support of irregular applications.