Abstract:Time-series data are widely used in industrial manufacturing, meteorology, ships, electric power, vehicles, finance, and other fields, which promotes the booming development of time-series database management systems. Faced with larger data scales and more diverse data modalities, efficiently storing and managing the data is very critical, and data encoding and compression become more and more important and are worth studying. Existing data encoding methods and systems fail to consider the characteristics of data in different modalities thoroughly, and some methods of time-series data analysis have not been applied to the scenario of data encoding. This study comprehensively introduces the multimodal data encoding methods and their system implementation in the Apache IoTDB time-series database system, especially for the industrial Internet of Things application scenarios. In the proposed encoding methods, data are comprehensively considered in multiple modals including timestamp data, numerical data, Boolean data, frequency domain data, text data, etc., and the characteristics of the corresponding modal of data fully are explored and utilized, especially the characteristics of timestamp intervals approximation in timestamp modality, to carry out targeted data encoding design. At the same time, the data quality issue that may occur in practical applications has been taken into consideration in the coding algorithm. Experimental evaluation and analysis on the encoding algorithm level and the system level over multiple datasets validate the effectiveness of the proposed encoding method and its system implementation