HTAP 评测基准的评测能力分析
作者:
作者单位:

1.华东师范大学数据科学与工程学院;2.蚂蚁集团OceanBase;3.工业和信息化部电子第五研究所

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


The Benchmarking Ability of HTAP Benchmarks
Author:
Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [57]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    对数据库系统即时修改数据的高效实时分析需求推动了数据库系统向同时支持 OLTP 业务和 OLAP 业务两种场景的 HTAP 数据库系统的快速发展.面对众多的HTAP数据库系统,为了推动HTAP数据库系统的公平比较和健康发展,定义和实现相应的评测基准来评估HTAP数据库系统的新特性至关重要. 首先,本文分析HTAP数据库系统的关键特征并抽象总结了HTAP数据库系统实现的关键技术.然后,本文提炼出HTAP数据库系统的设计难点和构建HTAP 评测基准的挑战,并基于此提出HTAP评测基准应考虑的设计维度,包括数据生成、负载生成、评价指标和架构支持性.本文对比了现有HTAP 评测基准在设计维度和实现技术上的差异,总结了评测基准在不同设计维度上的优劣.此外,我们运行已公开的典型评测基准,展示并分析他们对HTAP数据库系统关键特征的评测能力以及对不同HTAP数据库系统的横向对比的支持能力.最后,本文总结了对HTAP 评测基准的能力需求,并展望未来,指出语义一致的负载控制和新鲜数据访问度量是HTAP数据库系统评测基准定义的关键问题.

    Abstract:

    The requirement OLAP engine for the updated data from OLTP engine has promoted the development of Hybrid Trans- actional/Analytical Processing (HTAP) database systems. In order to promote the comparison and development of HTAP database systems, it is crucial to define and implement a benchmark for evaluating the new features of HTAP database systems. In this paper, we analyze the key features of HTAP database systems, and review the key technologies of their implementations. Then, we explore the difficulties of designing HTAP database systems and the challenges of building HTAP benchmarks. Based on these difficulties and challenges, we summarize the key design dimensions of HTAP benchmarks, including data generation, workload generation, evaluation metric and architecture supportability. Then, we compare the differences of existing classic HTAP benchmarks according to the design, and thus analyze their advantages and disadvantages. In addition, we demonstrate running performance of the selected popular benchmarks, and expose their benchmarking abilities. Finally, we summarize the requirements for HTAP benchmarks as well as some future research directions, i.e. semantically consistent workload control and metric for evaluating freshness data access.

    参考文献
    [1] Hybrid transaction/analytical processing will foster opportunities for dramatic business innovation. Gartner, [2022-05-28]. https://www.gartner.com/en/documents/2657815.
    [2] What is hybrid transaction/analytical processing (htap)?. Timo Elliott.2014. https://www.zdnet.com/paid-content/article/what-is-hybrid-transactionanalytical-processing-htap/.
    [3] Lahiri T, Chavan S, Colgan M, Das D, Ganesh A, Gleeson M, Hase S, Holloway A, Kamp J, Lee T-H, Loaiza J, Macnaughton N, Marwah V, Mukherjee N, Mullick A, Muthulingam S, Raja V, Roth M, Soylemez E, Zait M. Oracle database in-memory: a dual format in-memory database. 2015 IEEE 31st International Conference on Data Engineering. 2015: 1253–1258. [doi: 10.1109/ICDE.2015.7113373]
    [4] P.-?. Larson, A. Birka, E. N. Hanson, W. Huang, M. Nowakiewicz, and V. Papadimos. Real-Time Analytical Processing with SQL Server. VLDB, 8(12):1740–1751, 2015. [doi: 10.14778/2824032.2824071]
    [5] Lyu Z, Zhang HH, Xiong G, Guo G, Wang H, Chen J, Praveen A, Yang Y, Gao X, Wang A, Others. Greenplum: a hybrid database for transactional and analytical workloads. Proceedings of the 2021 International Conference on Management of Data. 2021: 2530–2542. [doi: 10.1145/3448016.3457562]
    [6] Nugroho DPA, Ismail HA. In-memory database and memsql. : 19. https://cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/2019/memsql.pdf
    [7] Zhou J, Xu M, Shraer A, Namasivayam B, Miller A, Tschannen E, Atherton S, Beamon AJ, Sears R, Leach J, Others. Foundationdb: a distributed unbundled transactional key value store. Proceedings of the 2021 International Conference on Management of Data. 2021: 2653–2666. [doi: 10.1145/3448016.3457559]
    [8] MySQL Heatwave. Real-time Analytics for MySQL Database Service, 2021.
    [9] Huang D, Liu Q, Cui Q, Fang Z, Ma X, Xu F, Shen L, Tang L, Zhou Y, Huang M, Others. TiDB: a raft-based htap database. Proceedings of the VLDB Endowment, VLDB Endowment, 2020, 13(12): 3072–3084. [doi: 10.14778/3415478.3415535]
    [10] Yang J, Rae I, Xu J, Shute J, Yuan Z, Lau K, Zeng Q, Zhao X, Ma J, Chen Z, Others. F1 lightning: htap as a service. Proceedings of the VLDB Endowment, VLDB Endowment, 2020, 13(12): 3313–3325. [doi: 10.14778/3415478.3415553]
    [11] Cao W, Liu Z, Wang P, Chen S, Zhu C, Zheng S, Wang Y, Ma G. PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment, VLDB Endowment, 2018, 11(12): 1849–1862. [doi: 10.14778/3229863.3229872]
    [12] Verbitski A, Gupta A, Saha D, Corey J, Gupta K, Brahmadesam M, Mittal R, Krishnamurthy S, Maurice S, Kharatishvilli T, Bao X. Amazon aurora: on avoiding distributed consensus for i/os, commits, and membership changes. Proceedings of the 2018 International Conference on Management of Data. 2018: 789–796. [doi: 10.1145/3183713.3196937]
    [13] ?zcan F, Tian Y, T?zün P. Hybrid transactional/analytical processing: A survey. Proceedings of the 2017 ACM International Conference on Management of Data. 2017: 1771-1775. [ doi: 10.1145/3035918.3054784]
    [14] Zhang C, Li GL, Feng JH, Zhang JT. Survey of Key Techniques of HTAP Databases. Journal of Software,2023,34(02):761-785.[doi: 10.13328/j.cnki.jos.006713]
    [15] Hu Z, Weng S ,Wang Q ,Yu R, Xu J, Zhang R, Zhou X. Data Sharing Model and Optimization Strategies in HTAP Database Systems. Journal of Software:1-23. [doi: 10.13328/j.cnki.jos.006901]
    [16] Abebe M, Lazu H, Daudjee K. Proteus: Autonomous adaptive storage for mixed workloads. Proceedings of the 2022 International Conference on Management of Data. 2022: 700-714.[doi: 10.1145/3514221.3517834]
    [17] Shen S, Chen R, Chen H, Zang B. Retrofitting high availability mechanism to tame hybrid transaction/analytical processing. 15th USENIX symposium on operating systems design and implementation. 2021: 219–238. https://www.usenix.org/system/files/osdi21-shen.pdf
    [18] Sirin, Utku et al. Performance Characterization of HTAP Workloads. 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021): 1829-1834.[doi: 10.1109/ICDE51399.2021.00162]
    [19] Dai W, Berleant D. Benchmarking contemporary deep learning hardware and frameworks: A survey of qualitative metrics. 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2019: 148-155.[doi: 10.1109/CogMI48466.2019.00029]
    [20] Gray J. Benchmark handbook: for database and transaction processing systems. Morgan Kaufmann Publishers Inc., 1992. http://research.microsoft.com/en-us/um/people/gray/BenchmarkHandbook/TOC.htm,chapter1
    [21] Liu, L., ?zsu, M.T. Database Benchmarks. Encyclopedia of Database Systems. Springer, New York, NY, 2018. [doi: 10.1007/978-1-4614-8265-9_80797]
    [22] Boissier M, Schlosser R, Uflacker M. Hybrid data layouts for tiered HTAP databases with pareto-optimal data placements. IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 2018: 209-220.[doi: 10.1109/ICDE.2018.00028]
    [23] Perera R M, Oetomo B, Rubinstein B I P, et al. No DBA? No regret! Multi-armed bandits for index tuning of analytical and HTAP workloads with provable guarantees. arXiv preprint arXiv:2108.10130, 2021.[doi: 10.48550/arXiv.2108.10130]
    [24] Arulraj J, Pavlo A, Menon P. Bridging the archipelago between row-stores and column-stores for hybrid workloads. Proceedings of the 2016 International Conference on Management of Data. 2016: 583-598.[doi: 10.1145/2882903.2915231]
    [25] TPC-C. https://www.tpc.org/tpcc/.
    [26] TPC-H. https://www.tpc.org/tpch/.
    [27] Milkai E, Chronis Y, Gaffney KP, Guo Z, Patel JM, Yu X. How good is my htap system? Proceedings of the 2022 international conference on management of data. 2022: 1810–1824.[ doi: 10.1145/3514221.3526148]
    [28] Jing C, Qian W, Zhou M, Zhou A. Benchmarking Data Management Systems: From Traditional Database to Emergent Big Data. Chinese Journal of Computers, 2015, 38(1): 18-34.
    [29] Makreshanski D, Giceva J, Barthels C, Alonso G. BatchDB: efficient isolated execution of hybrid oltp+olap workloads for interactive applications. Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data. Chicago Illinois USA: ACM, 2017: 37–50.[doi: 10.1145/3035918.3035959]
    [30] Kang, Guoxin, et al. OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems. 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022: 1822-1834. [doi: 10.1109/ICDE53745.2022.00182]
    [31] Lee R, Zhou M, Li C, et al. The art of balance: a RateupDB? experience of building a CPU/GPU hybrid database product[J]. Proceedings of the VLDB Endowment, 2021, 14(12): 2999-3013.[doi: 10.14778/3476311.3476378]
    [32] Bog A, Kruger J, Schaffner J. A composite benchmark for online transaction processing and operational reporting. 2008 IEEE Symposium on Advanced Management of Information for Globalized Enterprises (AMIGE). 2008: 1–5.[doi: 10.1109/AMIGE.2008.ECP.30]
    [33] Cole R, Funke F, Giakoumakis L, et al. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems (DBTest ''11). 2011, 8: 1-6. [doi: 10.1145/1988842.1988850]
    [34] Coelho F, Paulo J, Vila?a R, et al. HTAPBench: hybrid transactional and analytical processing benchmark. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE ''17), 2017: 293–304.[ doi: 10.1145/3030207.3030228]
    [35] Swarm64. https://github.com/swarm64/tpc-toolkit.
    [36] Raman V, Attaluri G, Barber R, et al. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment, 2013, 6(11): 1080-1091. [doi: 10.14778/2536222.2536233]
    [37] Moore G E. Cramming more components onto integrated circuits. Proceedings of the IEEE, 1998, 86(1): 82–85.[doi: 10.1109/JPROC.1998.658762].
    [38] Giavaresi G, Fini M, Chiesa R, et al. A novel multiphase anodic spark deposition coating for the improvement of orthopedic implant osseointegration: an experimental study in cortical bone of sheep. Journal of Biomedical Materials Research. Part A, 2008, 85(4): 1022-1031.[doi: 10.1002/jbm.a.31566]
    [39] Barber R, Raman V, Sidle R, et al. Wildfire: HTAP for big data.Encyclopedia of Big Data Technologies. Springer, 2019.[doi: 10.1007/978-3-319-63962-8_257-1]
    [40] LIU Wenjie, LI Jianbo, LI Zhanhuai, et al. A massire distributed relational database for financial application. Journal of Huazhong University, 2019, 47(2): 121-126.[doi: 10.13245/j.hust.190222]
    [41] Raza A, Chrysogelos P, Anadiotis AC, Ailamaki A. Adaptive htap through elastic resource scheduling. Proceedings of the 2020 ACM SIGMOD international conference on management of data. 2020: 2043–2054. [doi: 10.48550/arXiv.2004.05437]
    [42] Atta, Islam, et al. Reducing OLTP instruction misses with thread migration. Proceedings of the Eighth International Workshop on Data Management on New Hardware. 2012:9-15.[doi: 10.1145/2236584.2236586]
    [43] Grund M, Kru?ger J, Plattner H, Zeier A, Cudre-Mauroux P, Madden S. HYRISE: a main memory hybrid storage engine. Proceedings of the VLDB Endowment, 2010, 4(2): 105–116. [doi: 10.14778/1921071.1921077]
    [44] Kemper A, Neumann T. HyPer: a hybrid oltp /olap main memory database system based on virtual memory snapshots. 2011 IEEE 27th International Conference on Data Engineering. 2011: 195–206. [doi: 10.1109/ICDE.2011.5767867]
    [45] Yang, Zhenkun, et al. OceanBase: a 707 million tpmC distributed relational database system. Proceedings of the VLDB Endowment 15.12 (2022): 3385-3397. [doi: 10.14778/3554821.3554830]
    [46] Jim Diederich and Jack Milton. New methods and fast algorithms for database normalization. ACM Transactions on Database Systems (TODS) 13, 3 (1988), 339–365.[doi: 10.1145/44498.44499]
    [47] Huang C, Cahill M J, Fekete A D, et al. Decongestant: A Breath of Fresh Air for MongoDB Through Freshness-aware Reads. EDBT. 2021: 535-546. [doi: 10.5441/002/edbt.2021.64]
    [48] Funke F, Kemper A, Krompass S, et al. Metrics for measuring the performance of the mixed workload ch-benchmark. Topics in Performance Evaluation, Measurement and Characterization: Third TPC Technology Conference, TPCTC 2011, Seattle, WA, USA, 2011, Revised Selected Papers 3. Springer Berlin Heidelberg, 2012: 10-30.[doi: 10.1007/978-3-642-32627-1_2]
    [49] Bouzeghoub M. A framework for analysis of data freshness. Proceedings of the 2004 international workshop on Information quality in information systems. 2004: 59-67. [doi: 10.1145/1012453.1012464]
    [50] Sharma A, Schuhknecht FM, et al. Accelerating analytical processing in mvcc using fine-granular high-frequency virtual snapshotting. Proceedings of the 2018 international conference on management of data. 2018: 245–258. [doi: 10.1145/3183713.3196904]
    [51] Chen J, Ding Y, Liu Y, et al. ByteHTAP: bytedance''s HTAP system with high data freshness and strong data consistency. Proceedings of the VLDB Endowment, 2022, 15(12): 3411-3424. [doi: 10.14778/3554821.3554832]
    [52] Mahin MT, Wang B-C, Jagtiani K, Carey M, Murthy K. CH3: a mixed workload benchmark for scalable nosql. 2022 IEEE International Conference on Big Data (Big Data). 2022: 3780–3789. [doi: 10.1109/BigData55660.2022.10021092].
    [53] Carey M, Lychagin D, et al. CH2: a hybrid operational/analytical processing benchmark for nosql. Nambiar R, Poess M. Performance Evaluation and Benchmarking. Cham: Springer International Publishing, 2022, 13169: 62–80. [doi: 10.1007/978-3-030-94437-7_5]
    [54] Athanassoulis M, B?gh K S, Idreos S. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment, 2019, 12(13): 2393-2407.[doi: 10.14778/3358701.3358707]
    [55] [14] 张超, 李国良, 冯建华, 张金涛. HTAP数据库关键技术综述.软件学报,2022.[doi:10.13328/j.cnki.jos.006713]
    [56] [15] 胡梓锐, 翁思扬, 王清帅, 俞融, 徐金凯, 张蓉, 倪葎,乔典,周烜. HTAP 数据库系统数据共享模型和优化策略. 软件学报:1-23. [doi: 10.13328/j.cnki.jos.006901].
    [57] [28] 金澈清, 钱卫宁, 周敏奇, 周傲英. 数据管理系统评测基准:从传统数据库到新兴大数据. 计算机学报, 2015, 38(1): 18-34.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文
分享
文章指标
  • 点击次数:74
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2023-09-13
  • 最后修改日期:2023-12-25
  • 录用日期:2024-05-15
文章二维码
您是第20124500位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号