Time Series Symmetric Pattern Mining
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    With the integration of informatization and industrialization, the Internet of Things and industrial Internet have flourished, resulting in a large amount of industrial big data represented by time series. There are many valuable patterns in time series, among which symmetric patterns are widespread in various time series. Mining symmetric patterns has important research value in the fields of behavior analysis, trajectory tracking, anomaly detection, etc. However, the data volume of time series is often as high as tens or even hundreds of gigabytes. It can take months or even years to mine symmetric patterns using a direct nested query algorithm, and typical acceleration techniques such as indexing, lower bounds, and triangular inequalities can only produce speedup of one or two orders of magnitude at most. Therefore, based on the inspiration of the dynamic time warping algorithm, this study proposes a method that can mine all the symmetric patterns of the time series within the time complexity of O(w×|T|). Specifically, given the symmetric pattern length constraint, the symmetric subsequences can be calculated based on the interval dynamic programming. Then the largest number of non-overlapping symmetric patterns can be selected according to the greedy strategy. In addition, we also study the algorithm for mining symmetric patterns in the time series data stream, and dynamically adjusts the window size according to the characteristics of the data in the window to ensure the integrity of the symmetric pattern data. Using one artificial data set and three real data sets to experiment with the above method under different data volumes, it can be seen from the experimental results that compared with other symmetric pattern mining methods, this method has better performance in terms of pattern mining results and time overhead.

    Reference
    [1] Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB. Exact discovery of time series motifs. In:Proc. of the SDM. 2009. 473-484.
    [2] Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding YF, Dau HA, Silva DF, Mueen A, Keogh EJ. Matrix profile I:All pairs similarity joins for time series:A unifying view that includes motifs, discords and shapelets. In:Proc. of the ICDM. 2016. 1317-1322.
    [3] Ding H, Trajcevski G, Scheuermann P, Wang XY, Keogh EJ. Querying and mining of time series data:Experimental comparison of representations and distance measures. Proc. of the VLDB Endow, 2008, 1. 2):1542-1552.
    [4] Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover MB, Zhu Q, Zakaria J, Keogh EJ. Searching and mining trillions of time series subsequences under dynamic time warping. In:Proc. of the KDD. 2012. 262-270.
    [5] Song SX, Zhang AQ, Wang JM, Yu PS. SCREEN:Stream data cleaning under speed constraints. In:Proc. of the SIGMOD Conf. 2015. 827-841.
    [6] Gao F, Song SX, Wang JM. Time series data cleaning under multi-speed constraints. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3):689-711. in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6176.htm[doi:10.13328/j.cnki.jos.006176]
    [7] Hishinuma T, Hasegawa H, Tanaka T. SIMD parallel sparse matrix-vector and transposed-matrix-vector multiplication in DD precision. In:Proc. of the VECPAR. 2016. 21-34.
    [8] Wan SP. An efficient implementation of Manacher's algorithm. CoRR abs/2003.08211, 2020.
    [9] Song SX, Zhu H, Wang JM. Constraint-variance tolerant data repairing. In:Proc. of the SIGMOD Conf. 2016. 877-892.
    [10] Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the KDD. 1996. 226-231.
    [11] Jenks GF. The Data Model Concept in Statistical Mapping. In:Int'l Yearbook of Cartography, Vol.7, 1967. 186-190.
    [12] Cheng YZ. Mean shift, mode seeking, and clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1995, 17(8): 790-799.
    [13] Butcher JC. Random sampling from the normal distribution. The Computer Journal, 1961, 3(4):251-253.
    [14] Chen ZW, Song SX, Wei ZH, Fang JY, Long J. Approximating median absolute deviation with bounded error. Proc. of the VLDB Endowment, 2021, 14(11):2114-2126.
    [15] Sammut C, Webb GI. Encyclopedia of Machine Learning and Data Mining. Boston:Springer, 2017.
    [16] Salvador S, Chan P. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 2007, 11. 5): 561-580.
    [17] Cleveland, Robert B, Cleveland WS, Terpenning I. STL:A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 1990, 6(1):3-73.
    [18] Yoon CE, O'Reilly O, Bergen KJ, et al. Earthquake detection through computationally efficient similarity search. Science Advances, 2015, 1. 11):Article No.e1501057.
    [19] Bayardo RJ, Ma Y, Srikant R. Scaling up all pairs similarity search. In:Proc. of the WWW. 2007. 131-140.
    [20] Zhu Y, Zimmerman Z, Senobari NS, Yeh CCM, Funning GJ, Mueen A, Brisk P, Keogh EJ. Matrix profile II:Exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In:Proc. of the ICDM. 2016. 739-748.
    [21] Hastie T, Tibshirani R, Friedman JH, et al. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. New York:Springer, 2009.
    [22] Dagum EB, Bianconcini S. Seasonal Adjustment Methods and Real Time Trend-cycle Estimation. Berlin, Heidelberg:Springer, 2016.
    附中文参考文献:
    [6] 高菲, 宋韶旭, 王建民. 多区间速度约束下的时序数据清洗方法. 软件学报, 2021, 32(3):689-711. http://www.jos.org.cn/1000-9825/6176.htm[doi:10.13328/j.cnki.jos.006176]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李盼盼,宋韶旭,王建民.时间序列对称模式挖掘.软件学报,2022,33(3):968-984

Copy
Share
Article Metrics
  • Abstract:1337
  • PDF: 4752
  • HTML: 3266
  • Cited by: 0
History
  • Received:June 30,2021
  • Revised:July 31,2021
  • Online: October 21,2021
  • Published: March 06,2022
You are the first2032438Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063