GPU数据库OLAP优化技术研究

doi:10.13328/j.cnki.jos.006739

微信服务号

微信订阅号

2025年8月7日 22:33 星期四

首页 > 过刊浏览>2023年第34卷第11期 >5205-5229. DOI:10.13328/j.cnki.jos.006739

PDF HTML阅读 XML下载导出引用引用提醒

GPU数据库OLAP优化技术研究
DOI:
                        10.13328/j.cnki.jos.006739
                    
CSTR:
                        
                    
作者:
                        张延松张延松
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
刘专刘专
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
韩瑞琛韩瑞琛
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
张宇张宇
国家卫星气象中心, 北京 100081
在期刊界中查找
在百度中查找
在本站中查找
王珊王珊
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张延松(1973－),男,博士,副教授,主要研究领域为内存数据库,GPU数据库,新硬件数据库技术;刘专(1996－),男,硕士,主要研究领域为GPU数据库,内存数据库;韩瑞琛(1997－),男,硕士生,主要研究领域为内存数据库,新硬件数据库;张宇(1977－),女,博士,高级工程师,主要研究领域为数据仓库,OLAP;王珊(1944－),女,教授,CCF会士,主要研究领域为数据库,数据仓库,大数据管理
通讯作者:张宇，yuzhang@cma.gov.cn
中图分类号:TP311
基金项目:国家自然科学基金(61772533, 61732014); 北京市自然科学基金(4192066)

OLAP Optimization Techniques Based on GPU Database

Author:

ZHANG Yan-Song
ZHANG Yan-Song
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Zhuan
LIU Zhuan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
HAN Rui-Chen
HAN Rui-Chen
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yu
ZHANG Yu
National Satellite Meteorological Center, Beijing 100081, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Shan
WANG Shan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

GPU数据库近年来在学术界和工业界吸引了大量的关注. 尽管一些原型系统和商业系统(包括开源系统)开发了作为下一代的数据库系统, 但基于GPU的OLAP引擎性能是否真的超过CPU系统仍然存有疑问, 如果能够超越, 那什么样的负载/数据/查询处理模型更加适合, 则需要更深入的研究. 基于GPU的OLAP引擎有两个主要的技术路线: GPU内存处理模式和GPU加速模式. 前者将所有的数据集存储在GPU显存来充分利用GPU的计算性能和高带宽内存性能, 不足之处在于GPU容量有限的显存制约了数据集大小以及稀疏访问模式的数据存储降低GPU显存的存储效率. 后者只在GPU显存中存储部分数据集并通过GPU加速计算密集型负载来支持大数据集, 主要的挑战在于如何为GPU显存选择优化的数据分布和负载分布模型来最小化PCIe传输代价和最大化GPU计算效率. 致力于将两种技术路线集成到OLAP加速引擎中, 研究一个定制化的混合CPU-GPU平台上的OLAP框架OLAP Accelerator, 设计CPU内存计算、GPU内存计算和GPU加速3种OLAP计算模型, 实现GPU平台向量化查询处理技术, 优化显存利用率和查询性能, 探索GPU数据库的不同的技术路线和性能特征. 实验结果显示GPU内存向量化查询处理模型在性能和内存利用率两方面获得最佳性能, 与OmniSciDB和Hyper数据库相比性能达到3.1和4.2倍加速. 基于分区的GPU加速模式仅加速了连接负载来平衡CPU和GPU端的负载, 能够比GPU内存模式支持更大的数据集.

关键词:混合CPU-GPU平台;GPU加速OLAP;OLAP GPU内存;GPU量化处理模型

Abstract:

Graphics processing unit (GPU) databases have attracted a lot of attention from the academic and industrial communities in recent years. Although quite a few prototype systems and commercial systems (including open-source systems) have been developed as next-generation database systems, whether GPU-based online analytical processing (OLAP) engines really outperform central processing unit (CPU)-based systems is still in doubt. If they do, more in-depth research should be conducted on what kind of workload/data/query processing models are more appropriate. GPU-based OLAP engines have two major technical roadmaps: GPU in-memory processing mode and GPU-accelerated mode. The former stores all the datasets in the GPU device memory to take the best advantage of GPU’s computing power and high bandwidth memory. Its drawbacks are that the limited capacity of the GPU device memory restricts the dataset size and that memory-resident data in the sparse access mode reduces the storage efficiency of the GPU display memory. The latter only stores some datasets in the GPU device memory and accelerates computation-intensive workloads by GPU to support large datasets. The key challenges are how to choose the optimal data distribution and workload distribution models for the GPU device memory to minimize peripheral component interconnect express (PCIe) transfer overhead and maximize GPU’s computation efficiency. This study focuses on how to integrate these two technical roadmaps into the accelerated OLAP engine and proposes OLAP Accelerator as a customized OLAP framework for hybrid CPU-GPU platforms. In addition, this study designs three calculation models, namely, the CPU in-memory calculation model, the GPU in-memory calculation model, and the GPU-accelerated model for OLAP, and proposes a vectorized query processing technique for the GPU platform to optimize device memory utilization and query performance. Furthermore, the different technical roadmaps of GPU databases and corresponding performance characteristics are explored. The experimental results show that the vectorized query processing model based on GPU in-memory achieves the best performance and memory efficiency. The performance is 3.1 and 4.2 times faster than that achieved with the datasets OmniSciDB and Hyper, respectively. The partition-based GPU-accelerated mode only accelerates the join workloads to balance the workloads between the CPU and GPU ends and can support larger datasets than those the GPU in-memory mode can support.

Key words:hybrid CPU-GPU platform;GPU accelerated OLAP;GPU in-memory OLAP;GPU vectorized processing model

引用本文

张延松,刘专,韩瑞琛,张宇,王珊. GPU数据库OLAP优化技术研究.软件学报,2023,34(11):5205-5229

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-03-15
最后修改日期:2022-04-22
录用日期:
在线发布日期: 2023-06-16
出版日期: 2023-11-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码