Geno:基于代价的异构融合查询优化器
CSTR:
作者:
作者单位:

作者简介:

屠要峰(1972-),男,博士生,研究员,CCF高级会员,主要研究领域为大数据,数据库,机器学习,云计算;
卞福升(1971-),男,学士,主要研究领域为数据库.陈小强(1975-),男,学士,CCF专业会员,主要研究领域为数据库,异构计算;
吴非(1991-),男,硕士,主要研究领域为数据库,异构计算;
周士俊(1979-),男,学士,主要研究领域为数据库,云计算;
陈兵(1970-),男,教授,博士生导师,CCF杰出会员,主要研究领域为大数据,云计算,认知无线电网络.

通讯作者:

屠要峰,E-mail:13605151819@qq.com

中图分类号:

基金项目:

国家重点研发计划(2019YFB2102002);江苏省重点研发计划(BE2019012)


Geno: Cost-based Heterogeneous Fusion Query Optimizer
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    新型硬件及其构建的环境改变了传统的计算、存储以及网络体系,也改变了上层软件既往的设计假设,特别是通用处理器和专用加速器组成的异构计算架构,改变了数据库系统的底层框架设计和查询优化的代价模型.数据库系统需要针对新型硬件的特性做出适应性调整,以充分发挥新硬件的潜力.提出一种面向CPU/GPU/FPGA异构计算融合的基于代价的查询优化器Geno,可以灵活地调度并最优化地使用各类资源.主要的贡献是:发现根据系统环境硬件实际能力调整代价参数可以显著地提升查询计划的准确性,并提出一种异构资源代价计算方法和校准工具;通过对GPU、FPGA等异构硬件能力估算及对数据库系统硬件实际能力的校准,建立异构计算环境下查询处理的代价模型;实现了支持选择、投影、连接、聚合的GPU算子和FPGA算子,实现了GPU算子融合及流水线设计、FPGA算子流水线设计;通过基于代价的评估解决算子分配和调度问题,生成异构协同的执行计划,实现异构计算资源的协同优化,以充分发挥各异构资源的优势.实验结果表明,通过Geno校准后的参数值与实际硬件能力更加匹配.相比于PostgreSQL和GPU数据库HeteroDB,Geno能够生成更加合理的查询计划.TPC-H实验中,在行存表情况下,Geno比Postgresql执行时长减少了64%-93%,比Hetero-DB执行时长减少了1%-39%;在列存表情况下,Geno比Postgresql执行时间减少了87%-92%,比Hetero-DB执行时间减少了1%-81%;Geno列存与行存相比,查询执行时间减少了32%-89%.

    Abstract:

    The new hardware and its built environment have changed the traditional computing, storage and network systems, and also changed the previous design assumptions of the upper-level software. In particular, the heterogeneous computing architecture composed of general-purpose processors and dedicated accelerators has changed the design of the underlying framework of the database system and the cost model of query optimization. The database system needs to make adaptive adjustments to the characteristics of the new hardware to give full play to the potential of the new hardware. A cost-based query optimizer Geno for CPU/GPU/FPGA heterogeneous computing fusion is proposed, which can flexibly schedule and optimize the use of various computing resources. The main contribution is: finding that adjusting the cost parameters according to the actual hardware capabilities of the system environment can significantly improve the accuracy of the query plan, and proposing a calculation method and calibration tool for the cost of heterogeneous resources; through the estimation of the capabilities of heterogeneous hardware such as GPU and FPGA and the calibration of the actual capabilities of the database system hardware, establishing a cost model for query processing in a heterogeneous computing environment; implementing GPU operators and FPGA operators that support selection, projection, join and aggregation, realizing GPU operator pipeline design and FPGA operator pipeline design; solving the operator assignment and scheduling through cost-based evaluation, and generating a heterogeneous collaborative execution plan to realize the collaborative optimization of heterogeneous computing resources to makes full use of the advantages of each heterogeneous resource. Experiments show that the parameter values calibrated by Geno are more compatible with the actual hardware capabilities. Compared with PostgreSQL and GPU database HeteroDB, Geno can generate a more reasonable query plan. In the TPC-H scenario, the execution time of Geno in the case of row storage is 64%-93% less than that of Postgresql, and 1% to 39% less than that of Hetero-DB; in the case of column storage, Geno’s execution time is 87%-92% less than that of Postgresql, and 1%-81% less than that of Hetero-DB; Compared with row storage, Geno reduces query execution time 32%-89% in the case of column storage.

    参考文献
    相似文献
    引证文献
引用本文

屠要峰,陈小强,周士俊,卞福升,吴非,陈兵. Geno:基于代价的异构融合查询优化器.软件学报,2022,33(3):774-796

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-29
  • 最后修改日期:2021-07-31
  • 录用日期:
  • 在线发布日期: 2021-10-21
  • 出版日期: 2022-03-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号