Abstract:Pattern matching with wildcards is a classic problem, and matching with variable gap constraints is a popular direction in this field in recent years. In order to meet the requirement of high accuracy in some query applications, this paper proposes an algorithm (referred to as SGPM-SAI) to obtain a complete solution of pattern matching under the condition of sparse gaps constraint. SGPM-SAI firstly creates an index structure called W-SAM (wildcard suffix automation) for the preprocessed text, and then get EndPos collection for each pattern segmentation by searching string from W-SAM, and finally get the complete solution of pattern matching by means of EndPos sets intersection. Experimental results show that, regardless of pretreatment time, the performance of SGPM-SAI algorithm is at least 3~5 times higher than other competitive algorithms, such as KMP, BM, AC, suffix array. Compared with the latest version (SAIL-Gen) of SAIL algorithm, the performance of SGPM-SAI is significantly better under the condition of sparse gaps constraint. In addition, this paper introduces parallel process methods for SGPM-SAI algorithm so as to effectively utilize the massive parallel processing units of modern processors. Experimental results show that the acceleration of Parallel SGPM-SAI algorithm has significant effect, as well as good parallel scalability. This indicates that the presented method can take full advantage of the high parallel computation capability of modern many-core processors.