Abstract:K-clique enumeration is an important problem in subgraph matching, and the bitmap algorithm has been proven to be an effective method for solving the K-clique enumeration problem. Currently, state-of-the-art K-clique enumeration algorithms are accelerated by GPU. Previous studies have not investigated the impact of sparsity in real-world graph data on bitmap-based K-clique enumeration algorithms. Instead, static parallelization methods and bitmap construction strategies are commonly used on GPU, which result in low computational efficiency. This study proposes a thread-parallel load-balancing scheduling algorithm for bitmap tasks, which resolves the thread divergence problem while achieving high parallelism in the bitmap algorithm. Furthermore, it introduces a dynamic bitmap construction algorithm, enabling bitmaps to be constructed and activated at appropriate times for efficient execution of the bitmap algorithm. A GPU-friendly K-clique enumeration system, KCMiner, is implemented, which adaptively selects optimization strategies for K-clique enumeration tasks. Experimental results on GPU platforms show that the proposed method achieves up to 7.36 times speedup over the baseline K-clique enumeration algorithm and up to 30.2 times speedup over the baseline subgraph matching system.