基于Matrix Profile的时间序列分割技术改进
作者:
作者单位:

作者简介:

刘贺贺(1994-),男,硕士,主要研究领域为机器学习,时序数据分析;贺延俏(1996-),女,硕士,主要研究领域为机器学习,数据挖掘;邓诗卓(1990-),女,博士,CCF专业会员,主要研究领域为大数据分析,机器学习,时序数据分析.;吴刚(1978-),男,博士,副教授,CCF专业会员,主要研究领域为内存数据库,图数据库,知识图谱.;王波涛(1968-2021),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为云计算,大数据,位置服务,隐私保护,时序数据分析

通讯作者:

吴刚,wugang@cse.neu.edu.cn

中图分类号:

TP311

基金项目:

广东省基础与应用基础研究基金(2021A1515110761); 中央高校基本科研业务费专项(N2104002, N2016009)


Improvement of Time Series Segmentation Technology Based on Matrix Profile
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    时间序列分割是数据挖掘领域中的一个重要研究方向. 目前基于矩阵轮廓(matrix profile, MP)的时间序列分割技术得到了越来越多研究人员的关注, 并且取得了不错的研究成果. 不过该技术及其衍生算法仍然存在不足: 首先, 基于矩阵轮廓的快速低代价语义分割算法中对给定活动状态的时间序列分割时, 最近邻之间通过弧进行连接, 会出现弧跨越非目标活动状态匹配相似子序列问题; 其次, 现有提取分割点算法在提取分割点时采用给定长度窗口, 容易得到与真实值偏差较大的分割点, 降低准确性. 针对以上问题, 提出一种限制弧跨越的时间序列分割算法(limit arc curve cross-FLOSS, LAC-FLOSS), 该算法给弧添加权重, 形成一种带权弧, 并通过设置匹配距离阈值解决弧的跨状态子序列误匹配问题. 此外, 提出一种改进的提取分割点算法(improved extract regimes, IER), 它通过纠正弧跨越(corrected arc crossings, CAC)序列的形状特性, 从波谷中提取极值, 避免直接使用窗口在非拐点处取到分割点的问题. 在公开数据集datasets_seg和MobiAct上面进行对比实验, 验证以上两种解决方案的可行性和有效性.

    Abstract:

    Time series segmentation is an important research direction in the field of data mining. At present, the time series segmentation technique based on matrix profile (MP) has received increasing attention from researchers and has achieved great research results. However, this technique and its derivative algorithms also have their own short comings. For one thing, the matching of similar subsequences in the case of arcs crossing non-target activity states arises when the fast low-cost semantic segmentation algorithm based on MP is employed for time series segmentation of a given activity state and the nearest neighbors are connected by arcs. For another, the existing segmentation point extraction algorithm uses a given length window when extracting segmentation points. In this case, the segmentation points obtained are highly likely to exhibit large deviations from the real values, which reduces the accuracy. To address the above problems, this study proposes a time series segmentation algorithm limiting the arc cross, namely limit arc curve cross-FLOSS (LAC-FLOSS). This algorithm adds weights to arcs to obtain a kind of weighted arcs and solves the subsequence mismatch problem caused by the state crossing of the arcs by setting a matching distance threshold. In addition, an improved segmentation point extraction algorithm, namely, the improved extract regimes (IER) algorithm, is proposed. This algorithm extracts the extremes from the troughs according to the shape properties of the sequence of corrected arc crossings (CAC), thereby avoiding the problem that segmentation points are obtained at non-inflection points when the windows are used directly. Comparative experiments are conducted on the public datasets datasets_seg and MobiAct, and the results verify the feasibility and effectiveness of the above two solutions.

    参考文献
    相似文献
    引证文献
引用本文

刘贺贺,贺延俏,邓诗卓,吴刚,王波涛.基于Matrix Profile的时间序列分割技术改进.软件学报,2023,34(11):5267-5281

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-11
  • 最后修改日期:2022-03-11
  • 录用日期:
  • 在线发布日期: 2023-05-18
  • 出版日期: 2023-11-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号