Abstract:Sunway TaihuLight supercomputer is suitable for high-throughput computing systems, which tend to have memory access latency and network latency. There is a large class of problems namely time-to-solution, which requires high frequency iterations. The typical application of time-to-solution problems is molecular dynamics simulation. Computations in molecular dynamics simulation depend on the time. Therefore, the iterative computations are difficult to be parallelized. Time scale usually exceeds microsecond, which means that the number of steps is more than 1012. It is impossible to finish effective simulation in a limited time on long latency system. Therefore, the main performance bottleneck on long latency Sunway system is how to increase the iterative frequency. This study proposes a series of optimization strategies to improve the iterative frequency:(1) Reducing communication overhead and network competition costs through single-core communication combined with on-chip synchronization; (2) Optimizating the speed of synchronization between cores through waiting the shared memory variable and synchronizing the computing processing elements; (3) Reducing the data dependencies by changing the computation patterns; (4) Covering up the memory access latency by overlapping computation and communication; (5) Regulating the data structure to improve accessibility.