• Article
  • | |
  • Metrics
  • |
  • Reference [17]
  • |
  • Related [20]
  • |
  • Cited by [14]
  • | |
  • Comments
    Abstract:

    Existing string matching algorithms typically set the sliding window size as the pattern length. This paper presents a Linear DAWG Matching (LDM) algorithm, which divides the text into [n/m] overlapping windows of length 2m-1. In the windows, the algorithm attempts at m positions in batches. It firstly searches pattern prefixes from middle to left with a reversed suffix automaton, shifts to next window directly when it fails, otherwise, scans the corresponding suffixes forward with a finite automaton. Theoretical analysis shows that LDM has optimal time complexities in the worst (O(m)), best (O(n/m)) and average cases (O(n(1ogσm)/m)). Experimental comparison of LDM with the existing algorithms validates this theoretical claims of average case for searching long patterns. It further reveals that LDM is also efficient for searching short patterns on large alphabets. Thus, LDM algorithm not only suits for off-line pattern matching, but also fits in high-speed online pattern matching.

    Reference
    [1]Charras C, Lecroq TT. Handbook of Exact String Matching Algorithms. London: King's College London Publications, 2004.
    [2]Knuth DE, Jr. Morris JH, Pratt VR. Fast pattern matching in strings. SIAM Journal on Computing, 1977,6(1):323-350.
    [3]Baeza-Yates RA, Gonnet GH. A new approach to text searching. Communications of the ACM, 1992,35(10):74-82.
    [4]Boyer RS, Moore JS. A fast string searching algorithm. Communications of the ACM, 1977,20(10):762-772.
    [5]Sunday DM. A very fast substring search algorithm. Communications of the ACM, 1990,33(8):132-142.
    [6]Wang YC, Chen GL, Han KS. Faster algorithm for single pattern matching. Journal of Shanghai Jiaotong University,2001,35(2): 192-196 (in Chinese with English abstract).
    [7]Cantone D, Faro S. Fast-Search: A new efficient variant of the Boyer-Moore string matching algorithm. In: Alberto A, Massimo M, eds. Proc. of the 2nd Int'l Workshop on Experimental and Efficient Algorithms (WEA 2003). Lecture Notes in Computer Science 2647, Heidelberg: Springer-Verlag, 2003.47-58.
    [8]Crochemore M, Rytter W. Text Algorithms. Oxford: Oxford University Press, 1994.
    [9]Lecroq T. A variation on the Boyer-Moore algorithm. Theoretical Computer Science, 1992,92(1):119-144.
    [10]Crochemore M, Hancart C. Automata for matching patterns. In: Rosenberg G, Salomaa A, eds. Handbook of Formal Languages,volume 2: Linear Modeling: Background and Application. Heidelberg: Springer-Verlag, 1997. 399-462.
    [11]Yao AC. The complexity of pattern matching for a random string. SIAM Journal on Computing, 1979,8(3):368-387.
    [12]Navarro G, Raffinot M. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics, 2000,5(4): 1-36.
    [13]He LT, Fang BX. Linear nondeterministic dawg string matching algorithm. In: Alberto A, Massimo M, eds. Proc. of the 11 th Int'l Symp. on String Processing and Information Retrieval (SPIRE 2004). Lecture Notes in Computer Science 3246, Heidelberg:Springer-Verlag, 2004. 70-71.
    [14]Raffinot M. On the multi backward dawg matching algorithm (MultiBDM). In: Baeza-Yates R, ed. Proc. of the 4th South American Workshop on String Processing. Valparaiso: Carleton University Press, 1997. 149-165.
    [15]Navarro G, Fredriksson K. Average complexity of exact and approximate multiple string matching. Theoretical Computer Science,2004,321 (2-3):283-290.
    [16]Navarro G. A guided tour to approximate string matching. ACM Computing Surveys, 2001,33(1):31-88.
    [17]王永成,陈桂林,韩客松.一种快速的多模式字符串匹配算法.上海交通大学学报,2001,35(2):192-196.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法.软件学报,2005,16(5):676-683

Copy
Share
Article Metrics
  • Abstract:4393
  • PDF: 7171
  • HTML: 0
  • Cited by: 0
History
  • Received:September 22,2003
  • Revised:November 15,2004
You are the first2045224Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063