GPU数据库OLAP优化技术研究

doi:10.13328/j.cnki.jos.006739

微信服务号

微信订阅号

2025年4月4日 12:30 星期五

首页 > 过刊浏览>2023年第34卷第11期 >5205-5229. DOI:10.13328/j.cnki.jos.006739

PDF HTML阅读 XML下载导出引用引用提醒

GPU数据库OLAP优化技术研究
DOI:
                        10.13328/j.cnki.jos.006739
                    
CSTR:
                        
                    
作者:
                        张延松张延松
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
刘专刘专
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
韩瑞琛韩瑞琛
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
张宇张宇
国家卫星气象中心, 北京 100081
在期刊界中查找
在百度中查找
在本站中查找
王珊王珊
数据工程与知识工程教育部重点实验室 (中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张延松(1973－),男,博士,副教授,主要研究领域为内存数据库,GPU数据库,新硬件数据库技术;刘专(1996－),男,硕士,主要研究领域为GPU数据库,内存数据库;韩瑞琛(1997－),男,硕士生,主要研究领域为内存数据库,新硬件数据库;张宇(1977－),女,博士,高级工程师,主要研究领域为数据仓库,OLAP;王珊(1944－),女,教授,CCF会士,主要研究领域为数据库,数据仓库,大数据管理
通讯作者:张宇，yuzhang@cma.gov.cn
中图分类号:TP311
基金项目:国家自然科学基金(61772533, 61732014); 北京市自然科学基金(4192066)

OLAP Optimization Techniques Based on GPU Database

Author:

ZHANG Yan-Song
ZHANG Yan-Song
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Zhuan
LIU Zhuan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
HAN Rui-Chen
HAN Rui-Chen
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yu
ZHANG Yu
National Satellite Meteorological Center, Beijing 100081, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Shan
WANG Shan
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [42]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

GPU数据库近年来在学术界和工业界吸引了大量的关注. 尽管一些原型系统和商业系统(包括开源系统)开发了作为下一代的数据库系统, 但基于GPU的OLAP引擎性能是否真的超过CPU系统仍然存有疑问, 如果能够超越, 那什么样的负载/数据/查询处理模型更加适合, 则需要更深入的研究. 基于GPU的OLAP引擎有两个主要的技术路线: GPU内存处理模式和GPU加速模式. 前者将所有的数据集存储在GPU显存来充分利用GPU的计算性能和高带宽内存性能, 不足之处在于GPU容量有限的显存制约了数据集大小以及稀疏访问模式的数据存储降低GPU显存的存储效率. 后者只在GPU显存中存储部分数据集并通过GPU加速计算密集型负载来支持大数据集, 主要的挑战在于如何为GPU显存选择优化的数据分布和负载分布模型来最小化PCIe传输代价和最大化GPU计算效率. 致力于将两种技术路线集成到OLAP加速引擎中, 研究一个定制化的混合CPU-GPU平台上的OLAP框架OLAP Accelerator, 设计CPU内存计算、GPU内存计算和GPU加速3种OLAP计算模型, 实现GPU平台向量化查询处理技术, 优化显存利用率和查询性能, 探索GPU数据库的不同的技术路线和性能特征. 实验结果显示GPU内存向量化查询处理模型在性能和内存利用率两方面获得最佳性能, 与OmniSciDB和Hyper数据库相比性能达到3.1和4.2倍加速. 基于分区的GPU加速模式仅加速了连接负载来平衡CPU和GPU端的负载, 能够比GPU内存模式支持更大的数据集.

关键词:混合CPU-GPU平台;GPU加速OLAP;OLAP GPU内存;GPU量化处理模型

Abstract:

Graphics processing unit (GPU) databases have attracted a lot of attention from the academic and industrial communities in recent years. Although quite a few prototype systems and commercial systems (including open-source systems) have been developed as next-generation database systems, whether GPU-based online analytical processing (OLAP) engines really outperform central processing unit (CPU)-based systems is still in doubt. If they do, more in-depth research should be conducted on what kind of workload/data/query processing models are more appropriate. GPU-based OLAP engines have two major technical roadmaps: GPU in-memory processing mode and GPU-accelerated mode. The former stores all the datasets in the GPU device memory to take the best advantage of GPU’s computing power and high bandwidth memory. Its drawbacks are that the limited capacity of the GPU device memory restricts the dataset size and that memory-resident data in the sparse access mode reduces the storage efficiency of the GPU display memory. The latter only stores some datasets in the GPU device memory and accelerates computation-intensive workloads by GPU to support large datasets. The key challenges are how to choose the optimal data distribution and workload distribution models for the GPU device memory to minimize peripheral component interconnect express (PCIe) transfer overhead and maximize GPU’s computation efficiency. This study focuses on how to integrate these two technical roadmaps into the accelerated OLAP engine and proposes OLAP Accelerator as a customized OLAP framework for hybrid CPU-GPU platforms. In addition, this study designs three calculation models, namely, the CPU in-memory calculation model, the GPU in-memory calculation model, and the GPU-accelerated model for OLAP, and proposes a vectorized query processing technique for the GPU platform to optimize device memory utilization and query performance. Furthermore, the different technical roadmaps of GPU databases and corresponding performance characteristics are explored. The experimental results show that the vectorized query processing model based on GPU in-memory achieves the best performance and memory efficiency. The performance is 3.1 and 4.2 times faster than that achieved with the datasets OmniSciDB and Hyper, respectively. The partition-based GPU-accelerated mode only accelerates the join workloads to balance the workloads between the CPU and GPU ends and can support larger datasets than those the GPU in-memory mode can support.

Key words:hybrid CPU-GPU platform;GPU accelerated OLAP;GPU in-memory OLAP;GPU vectorized processing model

参考文献

[1] NVIDIA. NVIDIA A100 tensor core GPU. 2022. https://www.nvidia.cn/data-center/a100/

[2] Funke H, Breß S, Noll S, Markl V, Teubner J. Pipelined query processing in coprocessor environments. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 1603–1618.

[3] Yuan Y, Lee R, Zhang XD. The Yin and Yang of processing data warehousing queries on GPU devices. Proceedings of the VLDB Endowment, 2013, 6(10): 817–828. [doi: 10.14778/2536206.2536210]

[4] HeavyDB. Open source analytical database & SQL engine. 2022. https://www.heavy.ai/product/heavydb

[5] Shanbhag A, Madden S, Yu XY. A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1617–1632.

[6] Breß S. The design and implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS. Datenbank-Spektrum, 2014, 14(3): 199–209. [doi: 10.1007/s13222-014-0164-z]

[7] Funke H, Teubner J. Data-parallel query processing on non-uniform data. Proceedings of the VLDB Endowment, 2020, 13(6): 884–897. [doi: 10.1145/3183713.3183734]

[8] Karnagel T, Mueller R, Lohman GM. Optimizing GPU-accelerated group-by and aggregation. In: Proc. of the 6th Int’l Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures. 2015. 13–24.

[9] Li J, Tseng HW, Lin CB, Papakonstantinou Y, Swanson S. HippogriffDB: Balancing I/O and GPU bandwidth in big data analytics. Proceedings of the VLDB Endowment, 2016, 9(14): 1647–1658. [doi: 10.14778/3007328.3007331]

[10] Paul J, He BS, Lu SL, Lau CT. Improving execution efficiency of just-in-time compilation based query processing on GPUs. Proceedings of the VLDB Endowment, 2020, 14(2): 202–214. [doi: 10.14778/3425879.3425890]

[11] Kemper A, Neumann T. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proc. of the 27th IEEE Int’l Conf. on Data Engineering. Hannover: IEEE, 2011. 195–206.

[12] ACTIAN. Scalable, secure and flexible analytics database. 2022. https://www.actian.com/analytic-database/vector-analytic-database/

[13] Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Martin K. MonetDB: Two decades of research in column-oriented database architectures. Bulletin of the Technical Committee on Data Engineering, 2012, 35(1): 40–45.

[14] KaiGai. SSDtoGPU direct SQL on columnar-store (Apache arrow). 2019. https://kaigai.hatenablog.com/entry/2019/05/01/004618

[15] Zhang YS, Wang S, Lu JH. Improving performance by creating a native join-index for OLAP. Frontiers of Computer Science in China, 2011, 5(2): 236–249. [doi: 10.1007/s11704-011-9181-3]

[16] Zhang YS, Zhang Y, Zhou X, Lu JH. Main-memory foreign key joins on advanced processors: Design and re-evaluations for OLAP workloads. Distributed and Parallel Databases, 2019, 37(4): 469–506. [doi: 10.1007/s10619-018-7226-4]

[17] 张宇, 张延松, 陈红, 王珊. 一种适应GPU的混合OLAP查询处理模型. 软件学报, 2016, 27(5): 1246–1265. http://www.jos.org.cn/1000-9825/4828.htm

Zhang Y, Zhang YS, Chen H, Wang S. GPU adaptive hybrid OLAP query processing model. Ruan Jian Xue Bao/Journal of Software, 2016, 27(5): 1246-1265 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4828.htm

[18] Zhang YS, Zhang Y, Wang S, Lu JJ. Fusion OLAP: Fusing the pros of MOLAP and ROLAP together for in-memory OLAP. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(9): 1722–1735. [doi: 10.1109/TKDE.2018.2867522]

[19] 裴威, 李战怀, 潘巍. GPU数据库核心技术综述. 软件学报, 2021, 32(3): 859–885. http://www.jos.org.cn/1000-9825/6175.htm

Pei W, Li ZH, Pan W. Survey of key technologies in GPU database system. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3): 859-885 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6175.htm

[20] Neumann T. Efficiently compiling efficient query plans for modern hardware. Proceedings of the VLDB Endowment, 2011, 4(9): 539–550. [doi: 10.14778/2002938.2002940]

[21] Lee R, Zhou MH, Li C, Hu SG, Teng JP, Li DY, Zhang XD. The art of balance: A RateupDB^TM experience of building a CPU/GPU hybrid database product. Proceedings of the VLDB Endowment, 2021, 14(12): 2999–3013. [doi: 10.14778/3476311.3476378]

[22] Balkesen C, Teubner J, Alonso G, Özsu MT. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In: Proc. of the 29th IEEE Int’l Conf. on Data Engineering. Brisbane: IEEE, 2013. 362–373.

[23] He J, Lu M, He BS. Revisiting Co-processing for hash joins on the coupled CPU-GPU architecture. Proceedings of the VLDB Endowment, 2013, 6(10): 889–900. [doi: 10.14778/2536206.2536216]

[24] Rui R, Tu YC. Fast equi-join algorithms on GPUs: Design and implementation. In: Proc. of the 29th Int’l Conf. on Scientific and Statistical Database Management. Chicago: ACM, 2017. 17.

[25] Chavan S, Hopeman A, Lee S, Liu D, Mylavarapu A, Soylemez E. Accelerating joins and aggregations on the oracle in-memory database. In: Proc. of the 34th IEEE Int’l Conf. on Data Engineering (ICDE). Paris: IEEE, 2018. 1441–1452.

[26] Zhang YS, Zhou X, Zhang Y, Zhang Y, Su MC, Wang S. Virtual denormalization via array index reference for main memory OLAP. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4): 1061–1074. [doi: 10.1109/TKDE.2015.2499199]

[27] Zukowski M, Boncz PA, Nes NJ, Héman S. MonetDB/X100 - A DBMS in the CPU Cache. IEEE Data Engineering Bulletin, 2005, 28(2): 17–22.

[28] NVIDIA. NVIDIA A100 80GB PCIe GPU product brief. 2022. https://www.nvidia.cn/content/dam/en-zz/Solutions/Data-Center/a100/pdf/PB-10577-001_v02.pdf

[29] AMD. AMD instinct MI250X accelerator. 2022. https://www.amd.com/en/products/server-accelerators/instinct-mi250x

[30] Brytlyt. The world’s fastest analytics database. 2022. https://brytlyt.io/brytlyt-platform/database/

[31] Mark Litwintschik. Summary of the 1.1 billion taxi rides benchmarks. 2022. https://tech.marksblogg.com/benchmarks.html

[32] Bandle M, Giceva J, Neumann T. To partition, or not to partition, that is the join question in a real system. In: Proc. of the 2021 Int’l Conf. on Management of Data. ACM, 2021. 168–180.

[33] Fang WB, He BS, Luo Q. Database compression on graphics processors. Proc. of the VLDB Endowment, 2010, 3(1–2): 670–680.

[34] Lahiri T, Chavan S, Colgan M, et al. Oracle database in-memory: A dual format in-memory database. In: Proc. of the 31st IEEE Int’l Conf. on Data Engineering (ICDE). Macao: IEEE, 2015. 1253–1258.

[35] Blanas S, Li YN, Patel JM. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proc. of the 2011 ACM SIGMOD Int’l Conf. on Management of data. Athens: ACM, 2011. 37–48.

[36] Sioulas P, Chrysogelos P, Karpathiotakis M, Appuswamy R, Ailamaki A. Hardware-conscious hash-joins on GPUs. In: Proc. of the 35th IEEE Int’l Conf. on Data Engineering (ICDE). Macao: IEEE, 2019. 698–709.

[37] Paul J, He BS, Lu SL, Lau CT. Revisiting hash join on graphics processors: A decade later. Distributed and Parallel Databases, 2020, 38(4): 771–793. [doi: 10.1007/s10619-019-07280-z]

[38] Balkesen C, Alonso G, Teubner J, Özsu MT. 2013. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment, 2013, 7(1): 85–96. [doi: 10.14778/2732219.2732227]

[39] Raza A, Chrysogelos P, Sioulas P, Indjic V, Anadiotis AC, Ailamaki A. GPU-accelerated data management under the test of time. In: Proc. of the 10th Annual Conf. on Innovative Data Systems Research. Amsterdam, 2020.

[40] Paul J, He B, He BS. GPL: A GPU-based pipelined query processing engine. In: Proc. of the 2016 Int’l Conf. on Management of Data. San Francisco: ACM, 2016. 1935–1950.

引用本文

张延松,刘专,韩瑞琛,张宇,王珊. GPU数据库OLAP优化技术研究.软件学报,2023,34(11):5205-5229

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-03-15
最后修改日期:2022-04-22
录用日期:
在线发布日期: 2023-06-16
出版日期: 2023-11-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码