Abstract:Sparse triangular solver (SpTRSV) is an important computation kernel in scientific computing. The irregular memory access pattern of SpTRSV makes efficient data reuse difficult to achieve. Structured grid problems possess special nonzero patterns. OnSW26010 processor, the major building block of Sunway Taihulight supercomputer, these patterns are often exploited during the task partitioning stage to facilitate on-chip reuse of computed unknowns. Software-based routing is usually employed to implement inter-thread communication. Routing incurs overhead and imposes certain restrictions on nonzero patterns. This study achieves on-chip data reuse without routing. The input problem is partitioned and mapped onto SW26010 such that threads with data dependencies are always connected by the register communication network. This enables direct thread communication and obviates routing. The proposed solver is described and it is tested over a variety of problems. In the experiments, the proposed solver sustains an average memory bandwidth utilization of 88.2% with peak efficiency reaching 94.5% (24.5 GB/s).