Efficient Algorithm for Solving Strict Pattern Matching Under Nonoverlapping Condition
Author:
Affiliation:

Clc Number:

TP301

Fund Project:

National Key Research and Development Program of China (2016YFB1000901); National Natural Science Foundation of China (61976240, 61702157, 917446209); Graduate Student Innovation Program of Hebei Province (CXZZSS2019023)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Nonoverlapping conditional sequence pattern mining is a method of gap constrained sequence pattern mining. Compared with similar mining methods, this method is easier to find valuable frequent patterns. The core of the problem is to calculate the support (or the number of occurrences) of a pattern in the sequence, and then determine whether the pattern is frequent. The essence of calculating the support is the pattern matching under nonoverlapping condition. The current studies employ the iterative search to find a nonoverlapping occurrence, and then prune the useless nodes to calculate the support of the pattern. The computational time complexity of these algorithms is O(m×m×n×W), where m, n, and W are the pattern length, sequence length, and maximum gap, respectively. In order to improve the calculation speed of pattern matching under nonoverlapping condition, and effectively reduce sequence pattern mining time, this study proposes an efficient and effective algorithm, which converts the pattern matching problem into a NetTree, then starts from the minroot node of the NetTree, and adopts the backtracking strategy to iteratively search the leftmost child to calculate the nonoverlapping minimum occurrence. After pruning the occurrence on the NetTree, the problem can be solved without further searching and pruning invalid nodes. This study proves the completeness of the algorithm and reduces the time complexity to O(m×n×W). On this basis, the study continues to indicate that there are other three similar solving strategies for this problem, iteratively finds the leftmost parent path from the leftmost leaf, the rightmost child path from the rightmost root, and the rightmost parent path from the rightmost leaf. Extensively experimental results verify the efficiency of the proposed algorithm in this study, especially, the mining algorithm adopting this method can reduce the mining time.

    Reference
    Related
    Cited by
Get Citation

武优西,刘茜,闫文杰,郭磊,吴信东.无重叠条件严格模式匹配的高效求解算法.软件学报,2021,32(11):3331-3350

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 10,2019
  • Revised:December 25,2019
  • Adopted:
  • Online: November 05,2021
  • Published: November 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063