HTAP数据库关键技术综述

doi:10.13328/j.cnki.jos.006713

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2023年第34卷第2期 >761-785. DOI:10.13328/j.cnki.jos.006713

PDF HTML阅读 XML下载导出引用引用提醒

HTAP数据库关键技术综述
DOI:
                        10.13328/j.cnki.jos.006713
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:张超(1990－),男,博士,助理研究员,CCF专业会员,主要研究领域为数据库与大数据管理技术;李国良(1980－),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库,大数据分析和挖掘,群体计算;冯建华(1967－),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库,数据安全与隐私保护,信息;张金涛(2000－),男,硕士生,主要研究领域为数据库与机器学习的交叉技术
通讯作者:李国良，liguoliang@tsinghua.edu.cn
中图分类号:
基金项目:国家自然科学基金（61925205，62072261，62232009）

Survey of Key Techniques of HTAP Databases

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

混合事务与分析处理（hybrid transactional analytical processing，HTAP）技术是一种基于一站式架构同时处理事务请求与查询分析请求的技术.HTAP技术不仅消除了从关系型事务数据库到数据仓库的数据抽取、转换和加载过程，还支持实时地分析最新事务数据.然而，为了同时处理OLTP与OLAP，HTAP系统也需要在系统性能与数据分析新鲜度之间做出取舍，这主要是因为高并发、短时延的OLTP与带宽密集型、高时延的OLAP访问模式不同且互相干扰.目前，主流的HTAP数据库主要以行列共存的方式来支持混合事务与分析处理，但是由于该类数据库面向不同的业务场景，所以它们的存储架构与处理技术各有不同.首先，全面调研HTAP数据库，总结它们主要的应用场景与优缺点，并根据存储架构对它们进行分类、总结与对比.现有综述工作侧重于基于行/列单格式存储的HTAP数据库以及基于Spark的松耦合HTAP系统，而这里侧重于行列共存的实时HTAP数据库.特别地，凝炼了主流HTAP数据库关键技术，包括数据组织技术、数据同步技术、查询优化技术、资源调度技术这4个部分.同时总结分析了HTAP数据库构建技术与评测基准.最后，讨论了HTAP技术未来的研究方向与挑战.

Abstract:

Hybrid transactional analytical processing (HTAP) relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously. It not only eliminates the extract-transform-load (ETL) process, but also enables real-time data analysis. Nevertheless, in order to process the mixed workloads of OLTP and OLAP, such systems must balance the trade-off between workload isolation and data freshness. This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive, long-running OLAP workloads. Most existing HTAP databases leverage the best of row store and column store to support HTAP. As there are different requirements for different HTAP applications, HTAP databases have disparate storage strategies and processing techniques. This study comprehensively surveys the HTAP databases. The taxonomy of state-of-the-art HTAP databases is introduced according to their storage strategies and architectures. Then, their pros and cons are summarized and compared. Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems, real-time HTAP databases with a row-column dual store are focused on. Moreover, a deep dive into their key techniques is accomplished regarding data organization, data synchronization, query optimization, and resource scheduling. The existing HTAP benchmarks are also introduced. Finally, the research challenges and open problems are discussed for HTAP.

参考文献

相似文献

引证文献

引用本文

张超,李国良,冯建华,张金涛. HTAP数据库关键技术综述.软件学报,2023,34(2):761-785

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-02-18
最后修改日期:2022-05-08
录用日期:
在线发布日期: 2022-07-22
出版日期: 2023-02-06

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码