Abstract:With the rapid development of heterogeneous system, it's important to enhance data locality and fully utilize on-chip cache via compiler. However, classic reuse distance criteria exhibites platform-sensitive attribute in heterogeneous systems, therefore a unified reused distance calculation framework is needed for compiler to describe and optimize data locality. This paper proposes relaxed reuse distance with a unified calculation method in OpenCL programs as criteria for data layout optimization. Relaxed reuse distance is calculated with heterogeneous execution models and statistical approximation. Experiments are conducted on Intel Xeon Phi, AMD Opteron CPU, and Tilera Tile-GX36, and results show that this optimization can achieve at least 1.23x speedup on average.