GPU数据库OLAP优化技术研究
作者:
作者简介:

张延松(1973-),男,博士,副教授,主要研究领域为内存数据库,GPU数据库,新硬件数据库技术;刘专(1996-),男,硕士,主要研究领域为GPU数据库,内存数据库;韩瑞琛(1997-),男,硕士生,主要研究领域为内存数据库,新硬件数据库;张宇(1977-),女,博士,高级工程师,主要研究领域为数据仓库,OLAP;王珊(1944-),女,教授,CCF会士,主要研究领域为数据库,数据仓库,大数据管理

通讯作者:

张宇,yuzhang@cma.gov.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61772533, 61732014); 北京市自然科学基金(4192066)


OLAP Optimization Techniques Based on GPU Database
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [42]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    GPU数据库近年来在学术界和工业界吸引了大量的关注. 尽管一些原型系统和商业系统(包括开源系统)开发了作为下一代的数据库系统, 但基于GPU的OLAP引擎性能是否真的超过CPU系统仍然存有疑问, 如果能够超越, 那什么样的负载/数据/查询处理模型更加适合, 则需要更深入的研究. 基于GPU的OLAP引擎有两个主要的技术路线: GPU内存处理模式和GPU加速模式. 前者将所有的数据集存储在GPU显存来充分利用GPU的计算性能和高带宽内存性能, 不足之处在于GPU容量有限的显存制约了数据集大小以及稀疏访问模式的数据存储降低GPU显存的存储效率. 后者只在GPU显存中存储部分数据集并通过GPU加速计算密集型负载来支持大数据集, 主要的挑战在于如何为GPU显存选择优化的数据分布和负载分布模型来最小化PCIe传输代价和最大化GPU计算效率. 致力于将两种技术路线集成到OLAP加速引擎中, 研究一个定制化的混合CPU-GPU平台上的OLAP框架OLAP Accelerator, 设计CPU内存计算、GPU内存计算和GPU加速3种OLAP计算模型, 实现GPU平台向量化查询处理技术, 优化显存利用率和查询性能, 探索GPU数据库的不同的技术路线和性能特征. 实验结果显示GPU内存向量化查询处理模型在性能和内存利用率两方面获得最佳性能, 与OmniSciDB和Hyper数据库相比性能达到3.1和4.2倍加速. 基于分区的GPU加速模式仅加速了连接负载来平衡CPU和GPU端的负载, 能够比GPU内存模式支持更大的数据集.

    Abstract:

    Graphics processing unit (GPU) databases have attracted a lot of attention from the academic and industrial communities in recent years. Although quite a few prototype systems and commercial systems (including open-source systems) have been developed as next-generation database systems, whether GPU-based online analytical processing (OLAP) engines really outperform central processing unit (CPU)-based systems is still in doubt. If they do, more in-depth research should be conducted on what kind of workload/data/query processing models are more appropriate. GPU-based OLAP engines have two major technical roadmaps: GPU in-memory processing mode and GPU-accelerated mode. The former stores all the datasets in the GPU device memory to take the best advantage of GPU’s computing power and high bandwidth memory. Its drawbacks are that the limited capacity of the GPU device memory restricts the dataset size and that memory-resident data in the sparse access mode reduces the storage efficiency of the GPU display memory. The latter only stores some datasets in the GPU device memory and accelerates computation-intensive workloads by GPU to support large datasets. The key challenges are how to choose the optimal data distribution and workload distribution models for the GPU device memory to minimize peripheral component interconnect express (PCIe) transfer overhead and maximize GPU’s computation efficiency. This study focuses on how to integrate these two technical roadmaps into the accelerated OLAP engine and proposes OLAP Accelerator as a customized OLAP framework for hybrid CPU-GPU platforms. In addition, this study designs three calculation models, namely, the CPU in-memory calculation model, the GPU in-memory calculation model, and the GPU-accelerated model for OLAP, and proposes a vectorized query processing technique for the GPU platform to optimize device memory utilization and query performance. Furthermore, the different technical roadmaps of GPU databases and corresponding performance characteristics are explored. The experimental results show that the vectorized query processing model based on GPU in-memory achieves the best performance and memory efficiency. The performance is 3.1 and 4.2 times faster than that achieved with the datasets OmniSciDB and Hyper, respectively. The partition-based GPU-accelerated mode only accelerates the join workloads to balance the workloads between the CPU and GPU ends and can support larger datasets than those the GPU in-memory mode can support.

    参考文献
    [1] NVIDIA. NVIDIA A100 tensor core GPU. 2022. https://www.nvidia.cn/data-center/a100/
    [2] Funke H, Breß S, Noll S, Markl V, Teubner J. Pipelined query processing in coprocessor environments. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 1603–1618.
    [3] Yuan Y, Lee R, Zhang XD. The Yin and Yang of processing data warehousing queries on GPU devices. Proceedings of the VLDB Endowment, 2013, 6(10): 817–828. [doi: 10.14778/2536206.2536210]
    [4] HeavyDB. Open source analytical database & SQL engine. 2022. https://www.heavy.ai/product/heavydb
    [5] Shanbhag A, Madden S, Yu XY. A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1617–1632.
    [6] Breß S. The design and implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS. Datenbank-Spektrum, 2014, 14(3): 199–209. [doi: 10.1007/s13222-014-0164-z]
    [7] Funke H, Teubner J. Data-parallel query processing on non-uniform data. Proceedings of the VLDB Endowment, 2020, 13(6): 884–897. [doi: 10.1145/3183713.3183734]
    [8] Karnagel T, Mueller R, Lohman GM. Optimizing GPU-accelerated group-by and aggregation. In: Proc. of the 6th Int’l Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures. 2015. 13–24.
    [9] Li J, Tseng HW, Lin CB, Papakonstantinou Y, Swanson S. HippogriffDB: Balancing I/O and GPU bandwidth in big data analytics. Proceedings of the VLDB Endowment, 2016, 9(14): 1647–1658. [doi: 10.14778/3007328.3007331]
    [10] Paul J, He BS, Lu SL, Lau CT. Improving execution efficiency of just-in-time compilation based query processing on GPUs. Proceedings of the VLDB Endowment, 2020, 14(2): 202–214. [doi: 10.14778/3425879.3425890]
    [11] Kemper A, Neumann T. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proc. of the 27th IEEE Int’l Conf. on Data Engineering. Hannover: IEEE, 2011. 195–206.
    [12] ACTIAN. Scalable, secure and flexible analytics database. 2022. https://www.actian.com/analytic-database/vector-analytic-database/
    [13] Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Martin K. MonetDB: Two decades of research in column-oriented database architectures. Bulletin of the Technical Committee on Data Engineering, 2012, 35(1): 40–45.
    [14] KaiGai. SSDtoGPU direct SQL on columnar-store (Apache arrow). 2019. https://kaigai.hatenablog.com/entry/2019/05/01/004618
    [15] Zhang YS, Wang S, Lu JH. Improving performance by creating a native join-index for OLAP. Frontiers of Computer Science in China, 2011, 5(2): 236–249. [doi: 10.1007/s11704-011-9181-3]
    [16] Zhang YS, Zhang Y, Zhou X, Lu JH. Main-memory foreign key joins on advanced processors: Design and re-evaluations for OLAP workloads. Distributed and Parallel Databases, 2019, 37(4): 469–506. [doi: 10.1007/s10619-018-7226-4]
    [17] 张宇, 张延松, 陈红, 王珊. 一种适应GPU的混合OLAP查询处理模型. 软件学报, 2016, 27(5): 1246–1265. http://www.jos.org.cn/1000-9825/4828.htm
    Zhang Y, Zhang YS, Chen H, Wang S. GPU adaptive hybrid OLAP query processing model. Ruan Jian Xue Bao/Journal of Software, 2016, 27(5): 1246-1265 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4828.htm
    [18] Zhang YS, Zhang Y, Wang S, Lu JJ. Fusion OLAP: Fusing the pros of MOLAP and ROLAP together for in-memory OLAP. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(9): 1722–1735. [doi: 10.1109/TKDE.2018.2867522]
    [19] 裴威, 李战怀, 潘巍. GPU数据库核心技术综述. 软件学报, 2021, 32(3): 859–885. http://www.jos.org.cn/1000-9825/6175.htm
    Pei W, Li ZH, Pan W. Survey of key technologies in GPU database system. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3): 859-885 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6175.htm
    [20] Neumann T. Efficiently compiling efficient query plans for modern hardware. Proceedings of the VLDB Endowment, 2011, 4(9): 539–550. [doi: 10.14778/2002938.2002940]
    [21] Lee R, Zhou MH, Li C, Hu SG, Teng JP, Li DY, Zhang XD. The art of balance: A RateupDBTM experience of building a CPU/GPU hybrid database product. Proceedings of the VLDB Endowment, 2021, 14(12): 2999–3013. [doi: 10.14778/3476311.3476378]
    [22] Balkesen C, Teubner J, Alonso G, Özsu MT. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In: Proc. of the 29th IEEE Int’l Conf. on Data Engineering. Brisbane: IEEE, 2013. 362–373.
    [23] He J, Lu M, He BS. Revisiting Co-processing for hash joins on the coupled CPU-GPU architecture. Proceedings of the VLDB Endowment, 2013, 6(10): 889–900. [doi: 10.14778/2536206.2536216]
    [24] Rui R, Tu YC. Fast equi-join algorithms on GPUs: Design and implementation. In: Proc. of the 29th Int’l Conf. on Scientific and Statistical Database Management. Chicago: ACM, 2017. 17.
    [25] Chavan S, Hopeman A, Lee S, Liu D, Mylavarapu A, Soylemez E. Accelerating joins and aggregations on the oracle in-memory database. In: Proc. of the 34th IEEE Int’l Conf. on Data Engineering (ICDE). Paris: IEEE, 2018. 1441–1452.
    [26] Zhang YS, Zhou X, Zhang Y, Zhang Y, Su MC, Wang S. Virtual denormalization via array index reference for main memory OLAP. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4): 1061–1074. [doi: 10.1109/TKDE.2015.2499199]
    [27] Zukowski M, Boncz PA, Nes NJ, Héman S. MonetDB/X100 - A DBMS in the CPU Cache. IEEE Data Engineering Bulletin, 2005, 28(2): 17–22.
    [28] NVIDIA. NVIDIA A100 80GB PCIe GPU product brief. 2022. https://www.nvidia.cn/content/dam/en-zz/Solutions/Data-Center/a100/pdf/PB-10577-001_v02.pdf
    [29] AMD. AMD instinct MI250X accelerator. 2022. https://www.amd.com/en/products/server-accelerators/instinct-mi250x
    [30] Brytlyt. The world’s fastest analytics database. 2022. https://brytlyt.io/brytlyt-platform/database/
    [31] Mark Litwintschik. Summary of the 1.1 billion taxi rides benchmarks. 2022. https://tech.marksblogg.com/benchmarks.html
    [32] Bandle M, Giceva J, Neumann T. To partition, or not to partition, that is the join question in a real system. In: Proc. of the 2021 Int’l Conf. on Management of Data. ACM, 2021. 168–180.
    [33] Fang WB, He BS, Luo Q. Database compression on graphics processors. Proc. of the VLDB Endowment, 2010, 3(1–2): 670–680.
    [34] Lahiri T, Chavan S, Colgan M, et al. Oracle database in-memory: A dual format in-memory database. In: Proc. of the 31st IEEE Int’l Conf. on Data Engineering (ICDE). Macao: IEEE, 2015. 1253–1258.
    [35] Blanas S, Li YN, Patel JM. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proc. of the 2011 ACM SIGMOD Int’l Conf. on Management of data. Athens: ACM, 2011. 37–48.
    [36] Sioulas P, Chrysogelos P, Karpathiotakis M, Appuswamy R, Ailamaki A. Hardware-conscious hash-joins on GPUs. In: Proc. of the 35th IEEE Int’l Conf. on Data Engineering (ICDE). Macao: IEEE, 2019. 698–709.
    [37] Paul J, He BS, Lu SL, Lau CT. Revisiting hash join on graphics processors: A decade later. Distributed and Parallel Databases, 2020, 38(4): 771–793. [doi: 10.1007/s10619-019-07280-z]
    [38] Balkesen C, Alonso G, Teubner J, Özsu MT. 2013. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment, 2013, 7(1): 85–96. [doi: 10.14778/2732219.2732227]
    [39] Raza A, Chrysogelos P, Sioulas P, Indjic V, Anadiotis AC, Ailamaki A. GPU-accelerated data management under the test of time. In: Proc. of the 10th Annual Conf. on Innovative Data Systems Research. Amsterdam, 2020.
    [40] Paul J, He B, He BS. GPL: A GPU-based pipelined query processing engine. In: Proc. of the 2016 Int’l Conf. on Management of Data. San Francisco: ACM, 2016. 1935–1950.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张延松,刘专,韩瑞琛,张宇,王珊. GPU数据库OLAP优化技术研究.软件学报,2023,34(11):5205-5229

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-03-15
  • 最后修改日期:2022-04-22
  • 在线发布日期: 2023-06-16
  • 出版日期: 2023-11-06
文章二维码
您是第19786111位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号