Time Series Data Cleaning under Multi-speed Constraints
Author:
Affiliation:

Clc Number:

Fund Project:

National Key Research and Development Plan (2019YFB1705301); National Natural Science Foundation of China (62072265, 61572272, 71690231)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As the basis of data management and analysis, data quality issues have increasingly become a research hotspot in related fields. Furthermore, data quality can optimize and promote big data and artificial intelligence technology. Generally, physical failures or technical defects in data collection and recorder will cause certain anomalies in collected data. These anomalies will have a significant impact on subsequent data analysis and artificial intelligence processes, thus, data should be processed and cleaned accordingly before application. Existing repairing methods based on smoothing will cause a large number of originally correct data points being over-repaired into wrong values. And the constraint-based methods such as sequential dependency and SCREEN cannot accurately repair data under complex conditions since the constraints are relatively simple. A time series data repairing method under multi-speed constraints is further proposed based on the principle of minimum repairing. Then, dynamic programming is used to solve the problem of data anomalies with optimal repairing. Specifically, multiple speed intervals are proposed to constrain time series data, and a series of repairing candidate points is formed for each data point according to the speed constraints. Next, the optimal repair solution is selected from these candidates based on the dynamic programming method. In order to verify the feasibility and effectiveness of this method, an artificial data set, two real data sets, and another real data set with real anomalies are used for experiments under different rates of anomalies and data sizes. It can be seen from the experimental results that, compared with the existing methods based on smoothing or constraints, the proposed method has better performance in terms of RMS error and time cost. In addition, the verification of clustering and classification accuracy with several data sets shows the impact of data quality on subsequent data analysis and artificial intelligence. The proposed method can improve the quality of data analysis and artificial intelligence results.

    Reference
    Related
    Cited by
Get Citation

高菲,宋韶旭,王建民.多区间速度约束下的时序数据清洗方法.软件学报,2021,32(3):689-711

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 19,2020
  • Revised:September 03,2020
  • Adopted:
  • Online: January 21,2021
  • Published: March 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063