HTAP数据库系统数据共享模型和优化策略

doi:10.13328/j.cnki.jos.006901

微信服务号

微信订阅号

2025年5月11日 5:04 星期日

首页 > 过刊浏览>2024年第35卷第6期 >2951-2973. DOI:10.13328/j.cnki.jos.006901

PDF HTML阅读 XML下载导出引用引用提醒

HTAP数据库系统数据共享模型和优化策略
DOI:
                        10.13328/j.cnki.jos.006901
                    
CSTR:
                        
                    
作者:
                        胡梓锐胡梓锐
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
翁思扬翁思扬
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
王清帅王清帅
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
俞融俞融
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
徐金凯徐金凯
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
张蓉张蓉
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
周烜周烜
华东师范大学 上海市大数据管理系统工程研究中心, 上海 200062;华东师范大学 数据科学与工程学院, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:胡梓锐(2001－), 男, 博士生, 主要研究领域为HTAP数据库基准评测, 数据库智能化.
翁思扬(2000－), 男, 博士生, 主要研究领域为数据库基准评测, 数据库负载仿真.
王清帅(1997－), 男, 博士生, 主要研究领域为面向应用的数据库负载仿真, 新型数据库基准评测.
俞融(1999－), 女, 硕士生, 主要研究领域为HTAP数据库基准评测, 数据库系统.
徐金凯(1999－), 男, 硕士生, 主要研究领域为数据库基准评测.
张蓉(1978－), 女, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为分布式数据管理, 数据库基准评测, 数据流管理.
周烜(1979－), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为数据库系统, 大数据处理技术.
通讯作者:张蓉, E-mail: rzhang@dase.ecnu.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金(62072179); 2021 CCF-华为数据库创新研究计划

Data Sharing Model and Optimization Strategies in HTAP Database Systems

Author:

HU Zi-Rui
HU Zi-Rui
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
WENG Si-Yang
WENG Si-Yang
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Qing-Shuai
WANG Qing-Shuai
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
YU Rong
YU Rong
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
XU Jin-Kai
XU Jin-Kai
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Rong
ZHANG Rong
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Xuan
ZHOU Xuan
Shanghai Engineering Research Center of Big Data Management, East China Normal University, Shanghai 200062, China;School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

混合事务与分析处理数据库系统(HTAP)因其在一套系统上可以同时处理混合负载而逐渐获得大众认可. 为了不影响在线事务处理(OLTP)业务的写入性能, HTAP数据库系统往往会通过维护数据多版本或额外副本的方式来支持在线分析处理(OLAP)任务, 从而引入了TP/AP端版本的数据一致性问题. 同时, HTAP数据库系统面临资源隔离下实现高效数据共享的核心挑战, 且数据共享模型的设计综合权衡了业务对性能和数据新鲜度之间的要求. 因此, 为了系统地阐释现有HTAP数据库系统数据共享模型及优化策略, 首先根据TP生成版本与AP查询版本的差异, 通过一致性模型定义数据共享模型, 将HTAP数据共享的一致性模型分为3类, 分别为线性一致性, 顺序一致性与会话一致性. 然后, 梳理数据共享模型的全流程, 即从数据版本标识号分配, 数据版本同步, 数据版本追踪3个核心问题出发, 给出不同一致性模型的实现方法. 进一步, 以典型的HTAP数据库系统为例对具体实现进行深入的阐释. 最后, 针对数据共享过程中涉及的版本同步、追踪、回收等模块的优化策略进行归纳和分析, 并展望数据共享模型的优化方向, 指出数据同步范围自适应, 数据同步周期自调优和顺序一致性的新鲜度阈值约束控制是提高HTAP数据库系统性能和新鲜度的可能手段.

关键词:HTAP数据库系统;一致性模型;数据管理;混合负载;性能优化

Abstract:

Hybrid transactional/analytical processing (HTAP) database systems have gained extensive acknowledgment of users due to their full processing support of the mixed workloads in one system, i.e., transactions and analytical queries. Most HTAP database systems tend to maintain multiple data versions or additional replicas to accomplish online analytical processing (OLAP) without downgrading the write performance of online transactional processing (OLTP). This leads to a consistency problem between the data of TP and AP versions. Meanwhile, HTAP database systems face the core challenge of achieving efficient data sharing under resource isolation, and the data-sharing model integrates the trade-off between business requirements for performance and data freshness. To systematically explain the data-sharing model and optimization strategies of existing HTAP database systems, this study first utilizes the consistency models to define the data-sharing model and classify the consistency models for HTAP data sharing into three categories, namely, linear consistency, sequential consistency, and session consistency, according to the differences between TP generated versions and AP query versions. After that, it takes a deep dive into the whole process of data-sharing models from three core issues, i.e., data-version number distribution, data version synchronization, and data version tracking, and provides the implementation methods of different consistency models. Furthermore, this study takes a dozen of classic and popular HTAP database systems as examples for an in-depth interpretation of the implementation methods. Finally, it summarizes and analyzes the optimization strategies of version synchronization, tracking, and recycling modules involved in the data-sharing process and predicts the optimization directions of the data-sharing models. It is concluded that the self-adaptability of the data synchronization scope, self-tuning of the data synchronization cycle, and freshness-bound constraint control under sequential consistency are the possible means for better performance of HTAP database systems and higher freshness.

Key words:HTAP database system;consistency model;data management;hybrid workload;performance optimization

引用本文

胡梓锐,翁思扬,王清帅,俞融,徐金凯,张蓉,周烜. HTAP数据库系统数据共享模型和优化策略.软件学报,2024,35(6):2951-2973

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-09-18
最后修改日期:2022-11-11
录用日期:
在线发布日期: 2023-07-05
出版日期: 2024-06-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码