Abstract:Existing string matching algorithms typically set the sliding window size as the pattern length. This paper presents a Linear DAWG Matching (LDM) algorithm, which divides the text into [n/m] overlapping windows of length 2m-1. In the windows, the algorithm attempts at m positions in batches. It firstly searches pattern prefixes from middle to left with a reversed suffix automaton, shifts to next window directly when it fails, otherwise, scans the corresponding suffixes forward with a finite automaton. Theoretical analysis shows that LDM has optimal time complexities in the worst (O(m)), best (O(n/m)) and average cases (O(n(1ogσm)/m)). Experimental comparison of LDM with the existing algorithms validates this theoretical claims of average case for searching long patterns. It further reveals that LDM is also efficient for searching short patterns on large alphabets. Thus, LDM algorithm not only suits for off-line pattern matching, but also fits in high-speed online pattern matching.