Apache IoTDB中的多模态数据编码压缩
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划(2021YFB3300500);国家自然科学基金(62232005,62021002,62072265,92267203);北京信息科学与技术国家研究中心青年创新基金(BNR2022RC01011)


Multimodal Data Encoding and Compression in Apache IoTDB
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    时间序列数据在工业制造、气象、船舶、电力、车辆、金融等领域都有着广泛的应用,促进了时间序列数据库管理系统的蓬勃发展.面对愈加庞大的数据规模和多样的数据模态,高效的数据存储和管理方式十分关键,而数据的编码压缩愈发成为一个具有重要意义和价值的问题.现有的编码方法和相关系统未能充分考虑不同模态的数据特点,或者未把一些时序数据的处理方法应用于数据编码问题中.全面阐述了Apache IoTDB时序数据库系统中的多模态数据编码压缩方法及其系统实现,特别是面向工业物联网等应用场景.该编码方法较为全面地考虑包括时间戳数据、数值数据、布尔值数据、频域数据、文本数据等多个不同模态的数据,充分挖掘和利用各自模态数据的特点,特别是包括时间戳模态中时间戳序列间隔近似的特点等,进行有针对性的编码方案设计.同时,将实际应用场景中可能出现的数据质量问题因素纳入编码算法的考量中.在多个数据集上的编码算法层面和系统层面的实验评估和分析,验证了该编码压缩方法及其系统实现的效果.

    Abstract:

    Time-series data are widely used in industrial manufacturing, meteorology, ships, electric power, vehicles, finance, and other fields, which promotes the booming development of time-series database management systems. Faced with larger data scales and more diverse data modalities, efficiently storing and managing the data is very critical, and data encoding and compression become more and more important and are worth studying. Existing data encoding methods and systems fail to consider the characteristics of data in different modalities thoroughly, and some methods of time-series data analysis have not been applied to the scenario of data encoding. This study comprehensively introduces the multimodal data encoding methods and their system implementation in the Apache IoTDB time-series database system, especially for the industrial Internet of Things application scenarios. In the proposed encoding methods, data are comprehensively considered in multiple modals including timestamp data, numerical data, Boolean data, frequency domain data, text data, etc., and the characteristics of the corresponding modal of data fully are explored and utilized, especially the characteristics of timestamp intervals approximation in timestamp modality, to carry out targeted data encoding design. At the same time, the data quality issue that may occur in practical applications has been taken into consideration in the coding algorithm. Experimental evaluation and analysis on the encoding algorithm level and the system level over multiple datasets validate the effectiveness of the proposed encoding method and its system implementation

    参考文献
    相似文献
    引证文献
引用本文

贺文迪,夏天睿,宋韶旭,黄向东,王建民. Apache IoTDB中的多模态数据编码压缩.软件学报,2024,35(3):1173-1193

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-07-17
  • 最后修改日期:2023-09-05
  • 录用日期:
  • 在线发布日期: 2023-11-08
  • 出版日期: 2024-03-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号