融合选择提取与子类聚类的快速Shapelet发现算法
作者:
作者简介:

赵超(1995-),男,陕西兴平人,硕士,主要研究领域为时间序列数据分析,云计算;潘丽(1982-),女,博士,副教授,博士生导师,CCF专业会员,主要研究领域为云计算,云制造,市场导向资源分配;王腾江(1977-),男,硕士,主要研究领域为企业管理软件,企业大数据,移动互联网;嵇存(1989-),男,博士,讲师,CCF专业会员,主要研究领域为时间序列数据分析,企业服务计算,制造服务系统配;刘士军(1972-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为服务计算,企业服务计算和制造数据分析.

通讯作者:

刘士军,E-mail:lsj@sdu.edu.cn;潘丽,E-mail:panli@sdu.edu.cn

基金项目:

国家自然科学基金(61872222);山东省重点研发计划(2018GGX101019);山东大学未来学者计划


Fast Shapelet Discovery Algorithm Combining Selective Extraction and Subclass Clustering
Author:
Fund Project:

National Natural Science Foundation of China (61872222); Key Research and Development Program of Shandong Province (2018GGX101019); Young Scholars Program of Shandong University

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [34]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    基于Shapelet的时间序列分类算法具有可解释性,且分类准确率高、分类速度快.在这些算法中,Shapelet学习算法不依赖于单一分类器,能够学习出不在原始时间序列中的Shapelet,可以取得较高的分类准确率,同时还可以保证Shapelet发现和分类器构建同时完成;但如果产生的Shapelet过多,会增加依赖参数,导致训练时间太长,分类速度低,动态更新困难,且相似重复的Shapelet会降低分类的可解释性.提出一种选择性提取方法,用于更精准地选择Shapelet候选集,并改变学习方法以加速Shapelet学习过程;方法中提出了两个优化策略,通过对原始训练集采用时间序列聚类,可以得到原始时间序列中没有的Shapelet,同时在选择性提取算法中加入投票机制,以解决产生Shapelet过多的问题.实验表明,该算法在保持较高准确率的同时,可以显著地提高训练速度.

    Abstract:

    The time series classification algorithm based on Shapelet has the characteristics of interpretability, high classifica-tion accuracy and fast classification speed. Among these Shapelet-based algorithms, learning Shapelet algorithm does not rely on a single classifier, and Shapelet that is not in the original time series can be learned, which can achieve a high classification accuracy and ensure that Shapelet discovery and classifier construction are completed at the same time. However, if too many Shapelets are generated, it will increase the dependent parameters, resulting in too long training time, low classification speed, and difficult dynamic updates. And similar redundancy Shapelets will reduce the interpretability of the classification. This study proposes a new selective extraction algorithm to select Shapelet candidate set and change the learning method to accelerate the learning process of Shapelet and puts forward two optimization strategies. By using time series clustering for the original training set, Shapelets not in the original time series can be obtained. Meanwhile, a voting mechanism is added into the selective extraction algorithm to solve the problem of excessive Shapelet generation. Experiments show that the proposed algorithm can improve the training speed while maintaining high accuracy.

    参考文献
    [1] Ye L, Keogh E. Time series Shapelets:A new primitive for data mining. In:Proc. of the 15th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM. 2009. 947-956.[doi:10.1145/1557019.1557122]
    [2] Yuan JD, Wang ZH, Han M. Shapelet pruning and Shapelet coverage for time series classification. Ruan Jian Xue Bao/Journal of Software, 2015,26(9):2311-2325(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4702.htm[doi:10.13328/j.cnki.jos.004702]
    [3] Yan WH, Li GL. Research on time series classification based on Shapelet. Computer Science, 2019,46(1):36-42(in Chinese with English abstract).
    [4] Rakthanmanon T, Keogh E. Fast Shapelets:A scalable algorithm for discovering time series Shapelets. In:Proc. of the 2013 SIAM Int'l Conf. on Data Mining. Philadelphia:Society for Industrial and Applied Mathematics, 2013. 668-676.
    [5] Lines J, Davis LM, Hills J, Bagnall A. A Shapelet transform for time series classification. In:Proc. of the 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM, 2012. 289-297.[doi:10.1145/2339530.2339579]
    [6] Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A. Classification of time series by Shapelet transformation. Data Mining and Knowledge Discovery, 2014,28(4):851-881.[doi:10.1007/s10618-013-0322-1]
    [7] Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L. Learning time-series Shapelets. In:Proc. of the 20th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM, 2014. 392-401.[doi:10.1145/2623330.2623613]
    [8] Mueen A, Keogh E, Young N. Logical-Shapelets:An expressive primitive for time series classification. In:Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM, 2011. 1154-1162.
    [9] Chang KW, Deka B, Hwu WMW, Roth D. Efficient pattern-based time series classification on GPU. In:Proc. of the 2012 IEEE 12th Int'l Conf. on Data Mining. New York:IEEE, 2012. 131-140.[doi:10.1109/ICDM.2012.132]
    [10] He Q, Zhi D, Zhuang F, Shang T, Shi Z. Fast time series classification based on infrequent Shapelets. In:Proc. of the 2012 11th Int'l Conf. on Machine Learning and Applications. New York:IEEE, 2012. 215-219.[doi:10.1109/ICMLA.2012.44]
    [11] Wistuba M, Grabocka J, Schmidt-Thieme L. Ultra-fast Shapelets for time series classification. ArXiv preprint arXiv:1503.05018, 2015.
    [12] Hou L, Kwok JT, Zurada JM. Efficient learning of timeseries Shapelets. In:Proc. of the 30th AAAI Conf. on Artificial Intelligence. Palo Alto:AAAI, 2016. 1209-1215.
    [13] Yang Y, Deng Q, Shen F, Zhao J, Luo C. A Shapelet learning method for time series classification. In:Proc. of the 2016 IEEE 28th Int'l Conf. on Tools with Artificial Intelligence. New York:IEEE, 2016. 423-430.[doi:10.1109/ICTAI.2016.68]
    [14] Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y. Matrix profile I:All pairs similarity joins for time series:A unifying view that includes motifs, discords and shapelets. In:Proc. of the 2016 IEEE 16th Int'l Conf. on Data Mining. New York:IEEE, 2016. 1317-1322.[doi:10.1109/ICDM.2016.89]
    [15] Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Zimmerman Z, Silva DF, Mueen A, Keogh E. Time series joins, motifs, discords and Shapelets:A unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery, 2017,32(1):83-123.[doi:10.1007/s10618-017-0519-9]
    [16] Zhao C, Liu S, Pan L, Ji C, Yang C. Selecting superior candidates from a suitable set:A selective extraction algorithm for accelerating Shapelet discovery in time series data. In:Proc. of the 2019 IEEE 23rd Int'l Conf. on Computer Supported Cooperative Work in Design. New York:IEEE, 2019. 404-409.
    [17] Dau HA, Keogh E, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Chen Y, Hu B, Begum N, Bagnall A, Mueen A, Batista G. The UCR time series classification archive. URL, 2018. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
    [18] Sathianwiriyakhun P, Janyalikit T, Ratanamahatana CA. Fast and accurate template averaging for time series classification. In:Proc. of the 2016 8th Int'l Conf. on Knowledge and Smart Technology. New York:IEEE, 2016. 49-54.
    [19] Ji C, Zhao C, Pan L, Liu S, Yang C, Wu L. A fast Shapelet discovery algorithm based on important data points. Int'l Journal of Web Services Research, 2017,14(2):67-80.[doi:10.4018/IJWSR.2017040104]
    [20] Ji C, Liu S, Yang C, Pan L, Wu L, Meng X. A Shapelet selection algorithm for time series classification:New directions. Procedia Computer Science, 2018,129:461-467.[doi:10.1016/j.procs.2018.03.025]
    [21] Ji C, Zhao C, Liu S, Yang C, Pan L, Wu L, Meang X. A fast Shapelet selection algorithm for time series classification. Computer Networks, 2019,148:231-240.[doi:10.1016/j.comnet.2018.11.031]
    [22] Zhao C. A fast time series Shapelet discovery algorithm combining selective extraction and subclass clustering[MS. Thesis]. Ji'nan:Shandong University, 2019(in Chinese with English abstract).
    [23] Furao S, Hasegawa O. An incremental network for on-line unsupervised classification and topology learning. Neural Networks, 2006,19(1):90-106.[doi:10.1016/j.neunet.2005.04.006]
    [24] Furao S, Ogura T, Hasegawa O. An enhanced self-organizing incremental neural network for online unsupervised learning. Neural Networks, 2007,20(8):893-903.[doi:10.1016/j.neunet.2007.07.008]
    [25] Keogh E, Chu S, Hart D, Pazzani M. Segmenting time series:A survey and novel approach. In:Proc. of the Data Mining in Time Series Databases. 2004. 1-21.[doi:10.1142/9789812565402_0001]
    [26] Ji C. Time series classification methods based on Shapelet[Ph.D. Thesis]. Ji'nan:Shandong University, 2017(in Chinese with English abstract).
    [27] Bagnall A, Bostrom A, Large J, Lines J. The great time series classification bake off:An experimental evaluation of recently proposed algorithms. Extended version. arXiv preprint arXiv:1602.01711, 2016.
    [28] Bagnall A, Lines J, Bostrom A, Large J, Keogh E. The great time series classification bake off:A review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 2017,31(3):606-660.[doi:10.1007/s10618-016-0483-9]
    [29] Ji C, Zhao C, Pan L, Liu S, Yang C, Meng X. A just-in-time shapelet selection service for online time series classification. Computer Networks, 2019,157:89-98.[doi:/10.1016/j.comnet.2019.04.020]
    附中文参考文献:
    [2] 原继东,王志海,韩萌.基于Shapelet剪枝和覆盖的时间序列分类算法.软件学报,2015,26(9):2311-2325. http://www.jos.org.cn/1000-9825/4702.htm[doi:10.13328/j.cnki.jos.004702]
    [3] 闫汶和,李桂玲.基于shapelet的时间序列分类研究.计算机科学,2019,46(1):36-42.
    [22] 赵超.融合选择性提取与子类聚类的快速时间序列Shapelet发现算法[硕士学位论文].济南:山东大学,2019.
    [26] 嵇存.基于Shapelet的时间序列分类方法研究[博士学位学位].济南:山东大学,2017.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

赵超,王腾江,刘士军,潘丽,嵇存.融合选择提取与子类聚类的快速Shapelet发现算法.软件学报,2020,31(3):763-777

复制
分享
文章指标
  • 点击次数:2798
  • 下载次数: 6830
  • HTML阅读次数: 4331
  • 引用次数: 0
历史
  • 收稿日期:2019-08-12
  • 最后修改日期:2019-11-25
  • 在线发布日期: 2020-01-10
  • 出版日期: 2020-03-06
文章二维码
您是第20434697位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号