Apache TsFile中的短时间序列分组压缩及合并方法
作者:
作者单位:

作者简介:

通讯作者:

宋韶旭, E-mail: sxsong@tsinghua.edu.cn

中图分类号:

基金项目:

国家重点研发计划(2021YFB3300500);国家自然科学基金(62232005,62021002,62072265,92267203);国家电网公司总部科技项目(5700-202435261A-1-1-ZN)


Short Time Series Group Compression and Merging Methods in Apache TsFile
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    时间序列数据在工业制造、气象、电力、车辆等领域都有着广泛的应用,促进了时间序列数据库管理系统的发展. 越来越多的数据库系统向云端迁移,端边云协同的架构也愈发常见,所需要处理的数据规模愈加庞大. 在端边云协同、海量序列等场景中,由于同步周期短、数据刷盘频繁等原因,会产生大量的短时间序列,给数据库系统带来新的挑战. 有效的数据管理与压缩方法能显著提高存储性能,使得数据库系统足以胜任存储海量序列的重任. Apache TsFile是一个专为时序场景设计的列式存储文件格式,在Apache IoTDB等数据库管理系统中发挥重要作用. 本文阐述了Apache TsFile中应对大量短时间序列场景所使用的分组压缩及合并方法,特别是面向工业物联网等序列数量庞大的应用场景. 该分组压缩方法充分考虑了短时间序列场景中的数据特征,通过对设备分组的方法提高元数据利用率,降低文件索引大小,减少短时间序列并显著提高压缩效果. 经过真实世界数据集的验证,我们的分组方法在压缩效果、读取、写入、文件合并等多个方面均有显著提升,能更好地管理短时间序列场景下的TsFile文件.

    Abstract:

    Time series data are widely used in industrial manufacturing, meteorology, electricity, vehicles, and other fields, which has promoted the development of time series database management systems. More and more database systems are migrating to the cloud, and the architecture of end-cloud collaboration is becoming more common, leading to increasingly large data scales to be processed. In scenarios such as end-cloud collaboration and massive time series, a large number of short time series are generated due to short synchronization cycles, frequent data flushing, and other reasons, posing new challenges to database systems. Effective data management and compression methods can significantly improve storage performance, enabling database systems to handle the storage of massive time series. Apache TsFile is a columnar storage file format designed for time series scenarios, playing an important role in database management systems such as Apache IoTDB. This paper describes the group compression and merging methods used in Apache TsFile to address the scenario of a large number of short time series, especially in applications with a large number of time series such as industrial Internet of Things. This group compression method fully considers the data characteristics in the short time series scenario, improves the utilization of metadata through device grouping, reduces file index size, reduces short time series, and significantly improves compression efficiency. After validation with real-world datasets, our grouping method shows significant improvements in compression efficiency, reading, writing, file merging, and other aspects, enabling better management of TsFiles in scenarios with short time series.

    参考文献
    相似文献
    引证文献
引用本文

刘星宇,宋韶旭,黄向东,王建民. Apache TsFile中的短时间序列分组压缩及合并方法.软件学报,2025,36(3):1-21

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-05-27
  • 最后修改日期:2024-08-19
  • 录用日期:
  • 在线发布日期: 2024-09-13
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号