HTAP数据库关键技术综述
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61925205,62072261)


Survey of Key Techniques of HTAP Databases
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    混合事务与分析处理(Hybrid Transactional Analytical Processing,HTAP)技术是一种基于一站式架构同时处理事务请求与查询分析请求的技术.HTAP技术不仅消除了从关系型事务数据库到数据仓库的数据抽取、转换、和加载过程,还支持实时地分析最新事务数据.然而,为了同时处理OLTP与OLAP,HTAP系统也需要在系统性能与数据分析新鲜度之间做出取舍,这主要是因为高并发、短时延的OLTP与带宽密集型、高时延的OLAP访问模式不同且互相干扰.目前主流的HTAP数据库主要以行列共存的方式来支持混合事务与分析处理,但由于此类数据库面向不同的业务场景,所以它们的存储架构与处理技术各有不同.本篇综述首先全面调研HTAP数据库,总结它们主要的应用场景与优缺点,并根据存储架构对它们进行分类、总结、与对比.现有综述工作侧重于基于行/列单格式存储的HTAP数据库以及基于Spark的松耦合HTAP系统,而本文侧重于行列共存的实时HTAP数据库.特别地,本文凝练了主流HTAP数据库关键技术,包括数据组织技术、数据同步技术、查询优化技术、资源调度技术四个部分.本综述亦总结分析了HTAP数据库构建技术与评测基准;最后,讨论了HTAP技术未来的研究方向与挑战.

    Abstract:

    Hybrid Transactional Analytical Processing (HTAP) relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously. It not only eliminates the Extract-Transform-Load (ETL) process, but also enables real-time data analysis. However, in order to process the mixed workloads of OLTP and OLAP, such systems must balance the trade-off between workload isolation and data freshness. This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive, long-running OLAP workloads. Most existing HTAP databases leverage the best of row store and column store to support HTAP. As there are different requirements for different HTAP applications, HTAP databases have disparate storage strategies and processing techniques. In this survey, we offer a comprehensive survey of HTAP databases. We introduce a taxonomy of state-of-the-art HTAP databases according to their storage strategies and architectures, we then summarize and compare their pros and cons. Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems, we focus on real-time HTAP databases with a row-column dual store. Moreover, we take a deep dive into their key techniques regarding data organization, data synchronization, query optimization, and resource scheduling. We also introduce existing HTAP benchmarks. Finally, we discuss the research challenges and open problems for HTAP.

    参考文献
    相似文献
    引证文献
引用本文

张超,李国良,冯建华,张金涛. HTAP数据库关键技术综述.软件学报,,():0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-02-18
  • 最后修改日期:2022-05-08
  • 录用日期:
  • 在线发布日期: 2022-07-22
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号