Short Time Series Group Compression and Merging Method in Apache TsFile
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Time-series data are widely used in fields such as industrial manufacturing, meteorology, electric power, and vehicles, which has spurred the development of time-series database management systems. More and more database systems are migrating to the cloud, and the architecture of end-cloud collaboration is becoming more common, leading to increasingly large data scales to be processed. In scenarios such as end-cloud collaboration and massive time series, a large number of short time series are generated due to short synchronization cycles and frequent data flushing, among other reasons, presenting new challenges to database systems. Efficient data management and compression methods can significantly improve storage performance, enabling database systems to handle the storage of massive time series. Apache TsFile is a columnar storage file format specifically designed for time series scenarios, playing an important role in database management systems such as Apache IoTDB. This study elaborates on the group compression and merging methods used in Apache TsFile to address scenarios with a large number of short time series, especially in application scenarios with a vast number of time series such as the Industrial Internet of Things. This group compression method fully considers the data characteristics in the short time series scenario. Through device grouping, it improves metadata utilization, reduces file index size, decreases short time series, and significantly improves compression effectiveness. After validation with real-world datasets, the proposed grouping method shows significant improvements in compression effect, reading, writing, file merging, and other aspects, enabling better management of TsFiles in short time series scenarios.

    Reference
    Related
    Cited by
Get Citation

刘星宇,宋韶旭,黄向东,王建民. Apache TsFile 中的短时间序列分组压缩及合并方法.软件学报,2025,36(3):1-21

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 27,2024
  • Revised:July 16,2024
  • Adopted:
  • Online: September 13,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063