Abstract:Timeseries data is widely used in energy, manufacturing, finance, climate and many other fields. Aggregate queries are quite common in timeseries data analysis scenarios to quickly obtain summary of massive data. It is an effective way to accelerating aggregate queries by storing metadata. However, most existing timeseries databases slice data with fixed time windows, which requires real-time sorting and partitioning. In IoT applications with high writing concurrency and throughput, these additional costs are unacceptable. This study proposes a physical metadata management solution in Apache IoTDB for accelerating aggregate queries, in which data are sliced according to the physical storage sharding of files. Both synchronous and asynchronous computing are adopted to ensure writing performance ahead of queries. Out-of-order data streams are another major challenge in IoTDB applications. This study abstracts files with overlapping time ranges into out-of-order file groups and provides metadata for each group. Then aggregate queries will be rewritten into three sub-queries and efficiently executed on physical metadata and timeseries data. Experiments on various datasets have shown the improvement in performance of aggregate queries with the proposed solution, as well as the validity of different computing strategies.