Abstract:Parallel computing has become the mainstream. Among all the parallel computing systems, synchronization is one of the critical designs and is imperative to fully utilize the hardware performance. In recent years, GPU, as the most widely used accelerator, has developed rapidly, and many applications have placed greater demands on GPU thread synchronization. However, current GPUs cannot support thread synchronization efficiently in many real-world applications. Although many approaches have been proposed to support GPU thread synchronization and much progress has been made, the unique architecture and parallel pattern of GPUs still lead to many challenges in GPU thread synchronization research. In this study, thread synchronization in GPU parallel programming is divided into different categories according to different synchronization purposes and granularity. Around the synchronization expression and execution, the key problems and challenges of synchronization on GPUs are firstly analyzed, i.e., being difficult to express efficiently, incurring frequent concurrency bugs, and low execution efficiency. Secondly, the study introduces the research on synchronization for thread contention and synchronization for thread cooperation on GPUs in academia and industry in recent years from two aspects of thread synchronization expression method and performance optimization method based on different GPU thread synchronization granularity. Then the existing research methods are analyzed. On this basis, the study points out the future research trends and development prospects of GPU thread synchronization and feasible research methods, providing a reference for researchers in this field.