AlphaQO:鲁棒的学习型查询优化器
作者:
作者简介:

余翔(1994-),男,博士生,主要研究领域为人工智能驱动的数据库优化技术;
汤南(1981-),男,博士,研究员,主要研究领域为数据准备,数据可视化;
柴成亮(1992-),男,博士,CCF专业会员,主要研究领域为人机协作的数据管理,数据库系统;
孙佶(1994-),男,博士,主要研究领域为分布式相似查询,人工智能和数据库交叉技术;
张辛宁(2000-),男,本科生,主要研究领域为人工智能驱动的数据库优化技术;
李国良(1981-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为大数据,数据库,数据科学.

通讯作者:

李国良,E-mail:liguoliang@tsinghua.edu.cn

基金项目:

国家自然科学基金(61925205,61632016);华为和好未来资助项目


AlphaQO: Robust Learned Query Optimizer
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [33]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    由深度学习驱动的学习型查询优化器正在越来越广泛地受到研究者的关注,这些优化器往往能够取得近似甚至超过传统商业优化器的性能.与传统优化器不同的是,一个成功的学习型优化器往往依赖于足够多的高质量的负载查询作为训练数据.低质量的训练查询会导致学习型优化器在未来的查询上失效.提出了基于强化学习的鲁棒的学习型查询优化器训练框架AlphaQO,提前找到学习型优化器做不好的查询,以提高学习型优化器的鲁棒性.AlphaQO中存在两个重要部分:查询生成器和学习型优化器.查询生成器的目标是生成“难”的查询(传统优化器做得好,但是学习型优化器反而做得不好的查询).学习型优化器利用这些生成的查询进行测试和训练,并提供反馈让查询生成器进行更新.系统迭代交替的运行上述两个部分,分别进行训练.目的在于在提供尽量少的信息和消耗足够小的时间下找到足够多“难”的并且未见的查询给优化器训练,以提高学习型优化器的鲁棒性.实验结果显示:该生成器会提供越来越难的训练查询给学习型优化器;同时,这些查询能够提升学习型优化器的性能.

    Abstract:

    Learned database query optimizers, which are typically empowered by (deep) learning models, have attracted significant attention recently, because they can offer similar or even better performance than the state-of-the-art commercial optimizers that require hundreds of expert-hours to tune. A crucial factor of successfully training learned optimizers is training queries. Unfortunately, a good query workload that is sufficient for training learned optimizers is not always available. This study proposes a framework, called AlphaQO, on generating queries for learned optimizers with reinforcement learning (RL). AlphaQO is a loop system that consists of two main components, query generator and learned optimizer. Query generator aims at generating “hard” queries (i.e., those queries that the learned optimizer provides poor estimates). The learned optimizer will be trained using generated queries, as well as providing feedbacks (in terms of numerical rewards) to the query generator. If the generated queries are good, the query generator will get a high reward;otherwise, the query generator will get a low reward. The above process is performed iteratively, with the main goal that within a small budget, the learned optimizer can be trained and generalized well to a wide range of unseen queries. Extensive experiments show that AlphaQO can generate a relatively small number of queries and train a learned optimizer to outperform commercial optimizers. Moreover, learned optimizers need much less queries from AlphaQO than randomly generated queries, in order to well train the learned optimizer.

    参考文献
    [1] Selinger PG, Astrahan MM, Chamberlin DD, Lorie RA, Price TG. Access path selection in a relational database management system. In:Proc. of the Readings in Artificial Intelligence and Databases. Morgan Kaufmann, 1989. 511-522.
    [2] Babcock B, Chaudhuri S. Towards a robust query optimizer:A principled and practical approach. In:Proc. of the 2005 ACM SIGMOD Int'l Conf. on Management of Data. 2005. 119-130.
    [3] Trummer I, Koch C. Solving the join ordering problem via mixed integer linear programming. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. 2017. 1025-1040.
    [4] Ioannidis YE, Kang YC. Left-deep vs. bushy trees:An analysis of strategy spaces and its implications for query optimization. In: Proc. of the '91 ACM SIGMOD Int'l Conf. on Management of Data. 1991. 168-177.
    [5] Chande SV, Sinha M. Genetic optimization for the join ordering problem of database queries. In:Proc. of the 2011 Annual IEEE India Conf. IEEE, 2011. 1-5.
    [6] Waas F, Pellenkoft A. Join order selection (good enough is easy). In:Proc. of the British National Conf. on Databases. Berlin, Heidelberg:Springer, 2000. 51-67.
    [7] Fegaras L. A new heuristic for optimizing large queries. In:Proc. of the Int'l Conf. on Database and Expert Systems Applications. Berlin, Heidelberg:Springer, 1998. 726-735.
    [8] Marcus R, Negi P, Mao HZ, Zhang C, Alizadeh M, Kraska T, Papaemmanouil O, Tatbul N. Neo:A learned query optimizer. arXiv preprint arXiv:1904.03711, 2019.
    [9] Krishnan S, Yang ZH, Goldberg K, Hellerstein J, Stoica I. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196, 2018.
    [10] Marcus R, Papaemmanouil O. Deep reinforcement learning for join order enumeration. In:Proc. of the 1st Int'l Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 2018. 1-4.
    [11] Yu X, Li GL, Chai CL, Tang N. Reinforcement learning with tree-LSTM for join order selection. In:Proc. of the 2020 IEEE 36th Int'l Conf. on Data Engineering (ICDE). IEEE, 2020. 1297-1308.
    [12] Chaudhuri S. An overview of query optimization in relational systems. In:Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. 1998. 34-43.
    [13] Tan WC, Zhang MH, Elmeleegy H, Srivastava D. Reverse engineering aggregation queries. Proc. of the VLDB Endowment, 2017, 10(11):1394-1405.
    [14] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
    [15] Sutton RS, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Proc. of the NIPs, Vol.99. 1999.
    [16] Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3-4):229-256.
    [17] Leis V, Gubichev A, Mirchev A, Boncz P, Kemper A, Neumann T. How good are query optimizers, really? Proc. of the VLDB Endowment, 2015, 9(3):204-215.
    [18] Amari S. Backpropagation and stochastic gradient descent method. Neurocomputing, 1993, 5(4-5):185-196.
    [19] Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. In:Proc. of the ICML. 2010.
    [20] Trummer I, Wang JX, Wei ZY, Maram D, Moseley S, Jo S, Antonakakis J, Rayabhari A. Skinnerdb:Regret-bounded query evaluation via reinforcement learning. In:Proc. of the 2019 Int'l Conf. on Management of Data. 2019. 1-45.
    [21] Kipf A, Kipf T, Radke B, Leis V, Boncz P, Kemper A. Learned cardinalities:Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677, 2018.
    [22] Marcus R, Papaemmanouil O. Plan-structured deep neural network models for query performance prediction. arXiv preprint arXiv:1902.00132, 2019.
    [23] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    [24] Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D. Adversarial attacks and defences:A survey. arXiv preprint arXiv:1810.00069, 2018.
    [25] Ilyas A, Engstrom L, Athalye A, Lin J. Black-box adversarial attacks with limited queries and information. In:Proc. of the Int'l Conf. on Machine Learning. PMLR, 2018. 2137-2146.
    [26] Guo C, Gardner J, You YR, Wilson AG, Weinberger K. Simple black-box adversarial attacks. In:Proc. of the Int'l Conf. on Machine Learning. PMLR, 2019. 2484-2493.
    [27] Bojchevski A, Shchur O, Zügner D, Günnemann S. Netgan:Generating graphs via random walks. In:Proc. of the Int'l Conf. on Machine Learning. PMLR, 2018. 610-619.
    [28] Yu LT, Zhang WN, Wang J, Yu Y. Seqgan:Sequence generative adversarial nets with policy gradient. In:Proc. of the AAAI Conf. on Artificial Intelligence. 2017.
    [29] Fan J, Liu TY, Li GL, Chen JY, Shen YW, Du XY. Relational data synthesis using generative adversarial networks:A design space exploration. arXiv preprint arXiv:2008. 12763, 2020.
    [30] Xu L, Veeramachaneni K. Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264, 2018.
    [31] Park N, Mohammadi M, Gorde K, Jajodia S, Park H, Kim Y. Data synthesis based on generative adversarial networks. arXiv preprint arXiv:1806.03384, 2018.
    [32] Poess M, Jr. Stephens JM. Generating thousand benchmark queries in seconds. In:Proc. of the 30th Int'l Conf. on Very Large Data Bases, Vol.30. 2004. 1045-1053.
    [33] Vartak M, Raghavan V, Rundensteiner EA. Qrelx:Generating meaningful queries that provide cardinality assurance. In:Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. 2010. 1215-1218.
    [34] Khalek SA, Elkarablieh B, Laleye YO, Khurshid S. Query-aware test generation using a relational constraint solver. In:Proc. of the 200823rd IEEE/ACM Int'l Conf. on Automated Software Engineering. IEEE, 2008. 238-247.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

余翔,柴成亮,张辛宁,汤南,孙佶,李国良. AlphaQO:鲁棒的学习型查询优化器.软件学报,2022,33(3):814-831

复制
分享
文章指标
  • 点击次数:1564
  • 下载次数: 4479
  • HTML阅读次数: 3678
  • 引用次数: 0
历史
  • 收稿日期:2021-06-30
  • 最后修改日期:2021-07-31
  • 在线发布日期: 2021-10-21
  • 出版日期: 2022-03-06
文章二维码
您是第19705408位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号