1.华东师范大学数据科学与工程学院;2.蚂蚁集团OceanBase;3.工业和信息化部电子第五研究所
国家自然科学基金项目(面上项目,重点项目,重大项目)
The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)
对数据库系统即时修改数据的高效实时分析需求推动了数据库系统向同时支持 OLTP 业务和 OLAP 业务两种场景的 HTAP 数据库系统的快速发展.面对众多的HTAP数据库系统,为了推动HTAP数据库系统的公平比较和健康发展,定义和实现相应的评测基准来评估HTAP数据库系统的新特性至关重要. 首先,本文分析HTAP数据库系统的关键特征并抽象总结了HTAP数据库系统实现的关键技术.然后,本文提炼出HTAP数据库系统的设计难点和构建HTAP 评测基准的挑战,并基于此提出HTAP评测基准应考虑的设计维度,包括数据生成、负载生成、评价指标和架构支持性.本文对比了现有HTAP 评测基准在设计维度和实现技术上的差异,总结了评测基准在不同设计维度上的优劣.此外,我们运行已公开的典型评测基准,展示并分析他们对HTAP数据库系统关键特征的评测能力以及对不同HTAP数据库系统的横向对比的支持能力.最后,本文总结了对HTAP 评测基准的能力需求,并展望未来,指出语义一致的负载控制和新鲜数据访问度量是HTAP数据库系统评测基准定义的关键问题.
The requirement OLAP engine for the updated data from OLTP engine has promoted the development of Hybrid Trans- actional/Analytical Processing (HTAP) database systems. In order to promote the comparison and development of HTAP database systems, it is crucial to define and implement a benchmark for evaluating the new features of HTAP database systems. In this paper, we analyze the key features of HTAP database systems, and review the key technologies of their implementations. Then, we explore the difficulties of designing HTAP database systems and the challenges of building HTAP benchmarks. Based on these difficulties and challenges, we summarize the key design dimensions of HTAP benchmarks, including data generation, workload generation, evaluation metric and architecture supportability. Then, we compare the differences of existing classic HTAP benchmarks according to the design, and thus analyze their advantages and disadvantages. In addition, we demonstrate running performance of the selected popular benchmarks, and expose their benchmarking abilities. Finally, we summarize the requirements for HTAP benchmarks as well as some future research directions, i.e. semantically consistent workload control and metric for evaluating freshness data access.