国家自然科学基金(61772533, 61732014); 北京市自然科学基金(4192066)
GPU数据库近年来在学术界和工业界吸引了大量的关注. 尽管一些原型系统和商业系统(包括开源系统)开发了作为下一代的数据库系统, 但基于GPU的OLAP引擎性能是否真的超过CPU系统仍然存有疑问, 如果能够超越, 那什么样的负载/数据/查询处理模型更加适合, 则需要更深入的研究. 基于GPU的OLAP引擎有两个主要的技术路线: GPU内存处理模式和GPU加速模式. 前者将所有的数据集存储在GPU显存来充分利用GPU的计算性能和高带宽内存性能, 不足之处在于GPU容量有限的显存制约了数据集大小以及稀疏访问模式的数据存储降低GPU显存的存储效率. 后者只在GPU显存中存储部分数据集并通过GPU加速计算密集型负载来支持大数据集, 主要的挑战在于如何为GPU显存选择优化的数据分布和负载分布模型来最小化PCIe传输代价和最大化GPU计算效率. 致力于将两种技术路线集成到OLAP加速引擎中, 研究一个定制化的混合CPU-GPU平台上的OLAP框架OLAP Accelerator, 设计CPU内存计算、GPU内存计算和GPU加速3种OLAP计算模型, 实现GPU平台向量化查询处理技术, 优化显存利用率和查询性能, 探索GPU数据库的不同的技术路线和性能特征. 实验结果显示GPU内存向量化查询处理模型在性能和内存利用率两方面获得最佳性能, 与OmniSciDB和Hyper数据库相比性能达到3.1和4.2倍加速. 基于分区的GPU加速模式仅加速了连接负载来平衡CPU和GPU端的负载, 能够比GPU内存模式支持更大的数据集.
Graphics processing unit (GPU) databases have attracted a lot of attention from the academic and industrial communities in recent years. Although quite a few prototype systems and commercial systems (including open-source systems) have been developed as next-generation database systems, whether GPU-based online analytical processing (OLAP) engines really outperform central processing unit (CPU)-based systems is still in doubt. If they do, more in-depth research should be conducted on what kind of workload/data/query processing models are more appropriate. GPU-based OLAP engines have two major technical roadmaps: GPU in-memory processing mode and GPU-accelerated mode. The former stores all the datasets in the GPU device memory to take the best advantage of GPU’s computing power and high bandwidth memory. Its drawbacks are that the limited capacity of the GPU device memory restricts the dataset size and that memory-resident data in the sparse access mode reduces the storage efficiency of the GPU display memory. The latter only stores some datasets in the GPU device memory and accelerates computation-intensive workloads by GPU to support large datasets. The key challenges are how to choose the optimal data distribution and workload distribution models for the GPU device memory to minimize peripheral component interconnect express (PCIe) transfer overhead and maximize GPU’s computation efficiency. This study focuses on how to integrate these two technical roadmaps into the accelerated OLAP engine and proposes OLAP Accelerator as a customized OLAP framework for hybrid CPU-GPU platforms. In addition, this study designs three calculation models, namely, the CPU in-memory calculation model, the GPU in-memory calculation model, and the GPU-accelerated model for OLAP, and proposes a vectorized query processing technique for the GPU platform to optimize device memory utilization and query performance. Furthermore, the different technical roadmaps of GPU databases and corresponding performance characteristics are explored. The experimental results show that the vectorized query processing model based on GPU in-memory achieves the best performance and memory efficiency. The performance is 3.1 and 4.2 times faster than that achieved with the datasets OmniSciDB and Hyper, respectively. The partition-based GPU-accelerated mode only accelerates the join workloads to balance the workloads between the CPU and GPU ends and can support larger datasets than those the GPU in-memory mode can support.