基于树型门控循环单元的基数和代价估计器

doi:10.13328/j.cnki.jos.006448

微信服务号

微信订阅号

2025年5月1日 19:26 星期四

首页 > 过刊浏览>2022年第33卷第3期 >797-813. DOI:10.13328/j.cnki.jos.006448

PDF HTML阅读 XML下载导出引用引用提醒

基于树型门控循环单元的基数和代价估计器
DOI:
                        10.13328/j.cnki.jos.006448
                    
CSTR:
                        
                    
作者:
                        乔少杰乔少杰
成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
杨国平杨国平
成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
韩楠韩楠
成都信息工程大学 管理学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
屈露露屈露露
成都信息工程大学 软件工程学院, 四川 成都 610225
在期刊界中查找
在百度中查找
在本站中查找
陈浩陈浩
北京华为数字技术有限公司, 北京 100085
在期刊界中查找
在百度中查找
在本站中查找
毛睿毛睿
深圳大学 计算机与软件学院, 广东 深圳 518060
在期刊界中查找
在百度中查找
在本站中查找
元昌安元昌安
广西教育学院, 广西 南宁 530023
在期刊界中查找
在百度中查找
在本站中查找
Louis Alberto GUTIERREZLouis Alberto GUTIERREZ
Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:乔少杰(1981-),男,博士,教授,CCF杰出会员,主要研究领域为数据库,人工智能,数据挖掘;
陈浩(1989-),男,学士,主要研究领域为数据库的优化器与执行器;
杨国平(1997-),男,硕士生,CCF学生会员,主要研究领域为数据库查询优化;
毛睿(1975-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为大数据;
韩楠(1984-),女,博士,副教授,主要研究领域为数据库,数据挖掘.元昌安(1964-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据库;
屈露露(1998-),女,硕士生,CCF学生会员,主要研究领域为人工智能,数据挖掘;
Louis Alberto GUTIERREZ(1980-),男,博士,Researcher,主要研究领域为人工智能.
通讯作者:韩楠,E-mail:hannan@cuit.edu.cn
中图分类号:
基金项目:国家自然科学基金（61772091，61802035，61962006，61962038，U1802271，U2001212，62072311）；CCF-华为数据库创新研究计划（CCF-HuaweiDBIR2020004A）；四川省科技计划（2021JDJQ0021，2020YJ0481，2020YJ0430）；成都市重大科技创新项目（2021-YF08-00156-GX）；成都市技术创新研发项目（2021-YF05-00491-SN）；四川音乐学院数字媒体艺术四川省重点实验室资助项目（21DMAKL02）；广西自然科学基金（2018GXNSFDA138005）；广东省基础与应用基础研究基金（2020B1515120028）

Cardinality and Cost Estimator Based on Tree Gated Recurrent Unit

Author:

QIAO Shao-Jie
QIAO Shao-Jie
School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Guo-Ping
YANG Guo-Ping
School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
HAN Nan
HAN Nan
School of Management, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
QU Lu-Lu
QU Lu-Lu
School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Hao
CHEN Hao
Beijing Huawei Digital Technologies Co., Ltd, Beijing 100085, China
在期刊界中查找
在百度中查找
在本站中查找
MAO Rui
MAO Rui
College of Computer Science and Software Engineering, Shenzhen 518060, China
在期刊界中查找
在百度中查找
在本站中查找
YUAN Chang-An
YUAN Chang-An
Guangxi College of Education, Nanning 530023, China
在期刊界中查找
在百度中查找
在本站中查找
Louis Alberto GUTIERREZ
Louis Alberto GUTIERREZ
Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [34]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

基数估计和代价估计可以引导执行计划的选择，估计准确性对查询优化器至关重要.然而，传统数据库的代价和基数估计技术无法提供准确的估计，因为现有技术没有考虑多个表之间的相关性.将人工智能技术应用于数据库（artificial intelligence for databases，AI4DB）近期得到广泛关注，研究结果表明，基于学习的估计方法优于传统方法.然而，现有基于学习的方法仍然存在不足：首先，大部分的方法只能估计基数，但忽略了代价估计；其次，这些方法只能处理一些简单的查询语句，对于多表查询、嵌套查询等复杂查询则无能为力；同时，对字符串类型的值也很难处理.为了解决上述问题，提出了一种基于树型门控循环单元，Tree-GRU （tree-gated recurrent unit）的基数和代价估计方法，可以同时对基数和代价进行估计.此外，采用了有效的特征提取和编码技术，在特征提取中兼顾查询和执行计划，将特征嵌入到Tree-GRU中.对于字符串类型的值，使用神经网络自动提取子串与整串的关系，并进行字符串嵌入，从而使具有稀疏性的字符串变得容易被估计器处理.在JOB、Synthetic等数据集上进行了大量实验，实验结果表明，所提模型的各方面性能优于主流算法.

关键词:AI4DB;基数估计;代价估计;查询优化器;Tree-GRU;执行计划

Abstract:

Cardinality estimation and cost estimation can guide the selection of execution plan, and cardinality accuracy is very important for query optimizer. However, the cost and cardinality estimation techniques in traditional databases cannot provide accurate estimations because they do not consider the correlation across multiple tables. Recently, the application of artificial intelligence technology to databases (artificial intelligence for databases, AI4DB) has attracted wide attention, and the results show that the learning-based estimation method is superior to the traditional methods. However, the existing learning-based methods have some drawbacks. Firstly, most methods can only estimate the cardinality, but ignore the cost estimation. Secondly, these methods can only deal with some simple query statements, while do not work for complex queries, such as multi-table query and nested query. At the same time, it is also difficult for them to deal with string type of values. In order to solve the above problems, a novel method of estimating the cardinality and cost based on Tree-GRU (tree-gated recurrent unit), which can estimate the cardinality as well as the cost. In addition, an effective feature extraction and coding technology is applied, both query and execution plan are considered in feature extraction. These features are embedded into Tree-GRU. For the value of string type, the neural network is used to automatically extract the relationship between the substring and the whole string, embedding the string, so that the sparse string can be easily processed by the estimator. Extensive experiments were conducted on the JOB and Synthetic datasets and the results show that the performance of the proposed model outperforms the state-of-the-art algorithms in all aspects.

Key words:AI4DB;cardinality estimation;cost estimation;query optimizer;Tree-GRU;execution plan

参考文献

[1] Leis V, Gubichev A, Mirchev A, Boncz PA, Kemper A, Neumann T. How good are query optimizers, really? Proc. of the VLDB Endowment, 2015, 9(3):204-215.[doi:10.14778/2850583.2850594]

[2] Leis V, Radke B, Gubichev A, Mirchev A, Boncz PA, Kemper A, Neumann T. Query optimization through the looking glass, and what we found running the join order benchmark. Proc. of the VLDB Endowment, 2018, 27(5):643-668.[doi:10.14778/2850583. 2850594]

[3] Kipf A, Kipf T, Radke B, Leis V, Boncz PA, Kemper A. Learned cardinalities:Estimating correlated joins with deep learning. In: Proc. of the 9th Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2019. 1-8.

[4] Sun J, Li GL. An end-to-end learning-based cost estimator. Proc. of the VLDB Endowment, 2019, 13(3):307-319.[doi:10.14778/3368289.3368296]

[5] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In:Proc. of the 1st Int'l Conf. on Learning Representations. New York:ACM, 2013. 1-9.

[6] Dai YF, Guo WZ, Chen X, Zhang ZW. Relation classification via LSTMs based on sequence and tree structure. IEEE Access, 2018, 6:64927-64937.[doi:10.1109/ACCESS.2018.2877934]

[7] Ioannidis YE. The history of histograms (abridged). In:Proc. of 29th Int'l Conf. on Very Large Data Bases. Berlin:Morgan Kaufmann, 2003. 19-30.[doi:10.1016/B978-012722442-8/50011-2]

[8] Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31. 2):182-209.[doi:10.1016/0022-0000(85)90041-8]

[9] Wu WT, Naughton JF, Singh H. Sampling-based query re-optimization. In:Proc. of the 2016 Int'l Conf. on Management of Data. San Francisco, New York:ACM, 2016. 1721-1736.[doi:10.1145/2882903.2882914]

[10] Malik T, Burns RC, Chawla NV. A black-box approach to query cardinality estimation. In:Proc. of the 3rd Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2007. 56-67.

[11] Ortiz J, Balazinska M, Gehrke J, Keerthi SS. Learning state representations for query optimization with deep reinforcement learning. In:Proc. of the 2nd Workshop on Data Management for End-to-end Machine Learning. New York:ACM, 2018.[doi: 10.1145/3209889.3209890]

[12] Yang ZH, Liang E, Kamsetty A, Wu CG, Duan Y, Chen X, Abbeel P, Hellerstein JM, Krishnan S, Stoica I. Deep unsupervised cardinality estimation. Proc. of the VLDB Endowment, 2019, 13(3):279-292.[doi:10.14778/3368289.3368294]

[13] Marcus R, Papaemmanouil O. Plan-structured deep neural network models for query performance prediction. Proc. of the VLDB Endowment, 2019, 12(11):1733-1746.[doi:10.14778/3342263.3342646]

[14] Leis V, Radke B, Gubichev A, Kemper A, Neumann T. Cardinality estimation done right:Index-based join sampling. In:Proc. of the 8th Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2017. 1-8.

[15] Fan J, Li GL, Zhou LZ. Interactive SQL query suggestion:Making databases user-friendly. In:Proc. of the 27th Int'l Conf. on Data Engineering. Washington:IEEE, 2011. 351-362.[doi:10.1109/ICDE.2011.5767843]

[16] Li GL, Chai CL, Fan J, Weng XP, Li J, Zheng YD, Li YB, Yu X, Zhang XH, Yuan HT. CDB:Optimizing queries with crowdbased selections and joins. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. New York:ACM, 2017. 1463-1478. [doi:10.1145/3035918.3064036]

[17] Li GL, Zhou XH. XuanYuan:An AI-native database systems. Ruan Jian Xue Bao/Journal of Software, 2020, 31. 3):831-844(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5899.htm[doi:10.13328/j.cnki.jos.005899]

[18] Yu X, Li GL, Chai CL, Tang N. Reinforcement learning with Tree-LSTM for join order selection. In:Proc. of the 36th IEEE Int'l Conf. on Data Engineering. Washington:IEEE, 2020. 1297-1308.[doi:10.1109/ICDE48307.2020.00116]

[19] Zhang J, Liu Y, Zhou K, Li GL. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In:Proc. of the 2019 Int'l Conf. on Management of Data. New York:ACM, 2019. 415-432.[doi:10.1145/3299869.3300085]

[20] Li GL, Zhou XH, Li SF, Gao B. QTune:A query-aware database tuning system with deep reinforcement learning. Proc. of the VLDB Endowment, 2019, 12(12):2118-2130.[doi:10.14778/3352063.3352129]

[21] Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In:Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics. Valencia:Association for Computational Linguistics, 2017. 427-431.[doi:10.18653/v1/e17-2068]

[22] Wolf A, Barbosa CRH, Costa E. Multiple MLP neural networks applied on the determination of segment limits in ECG signals. In: Proc. of the 7th Int'l Work-Conf. on Artificial and Natural Neural Networks, Vol.2687. Berlin:Springer-Verlag, 2003. 607-614. [doi:10.1007/3-540-44869-1_77]

[23] Liu B, Liang Y. Optimal function approximation with ReLU neural networks. Neurocomputing, 2021, 435:216-227.[doi:10.1016/j.neucom.2021.01.007]

[24] Graves A, Mohamed A, Hinton GE. Speech recognition with deep recurrent neural networks. In:Proc. of 2013 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. Washington:IEEE, 2013. 6645-6649.[doi:10.1109/ICASSP.2013.6638947]

[25] Wang YZ, Li SJ, Yang JF, Sun X, Wang HF. Tag-enhanced tree-structured neural networks for implicit discourse relation classification. In:Proc. of the the 8th Int'l Joint Conf. on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. 496-505.

[26] Pennington J, Socher R, Manning CD. Glove:Global vectors for word representation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing. 2014. 1532-1543.[doi:10.3115/v1/d14-1162]

[27] Capuano EA. A TCO analysis of cluster servers based on TPC-H benchmarks. In:Proc. of the 2005 IEEE Int'l Conf. on Cluster Computing. Washington:IEEE, 2005. 1-10.[doi:10.1109/CLUSTR.2005.347080]

[28] Mudigere D, Naumov M, Spisak J, Chauhan G, Kokhlikyan N, Singh A, Goswami V. Building recommender systems with PyTorch. In:Proc. of the 26th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. New York:ACM, 2020. 3525-3526.[doi: 10.1145/3394486.3406714]

[29] AlMouhamed MA, Khan AH, Mohammad N. A review of CUDA optimization techniques and tools for structured grid computing. Computing, 2020, 102(4):977-1003.[doi:10.1007/s00607-019-00744-1]

[30] Kingma DP, Ba J. Adam:A method for stochastic optimization. In:Proc. of the 3rd Int'l Conf. on Learning Representations. New York:ACM, 2015. 1-15.

[31] Shi NC, Li DW, Hong MY, Sun RY. RMSprop converges with proper hyper-parameter. In:Proc. of the 9th Int'l Conf. on Learning Representations. New York:ACM, 2021. 1-10.

[32] Duchi JC, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12:2121-2159.[doi:10.1109/TNN.2011.2146788]

附中文参考文献:

[17] 李国良, 周煊赫. 轩辕:AI原生数据库系统. 软件学报, 2020, 31. 3):831-844. http://www.jos.org.cn/1000-9825/5899.htm[doi: 10.13328/j.cnki.jos.005899]

引用本文

乔少杰,杨国平,韩楠,屈露露,陈浩,毛睿,元昌安,Louis Alberto GUTIERREZ.基于树型门控循环单元的基数和代价估计器.软件学报,2022,33(3):797-813

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-06-30
最后修改日期:2021-07-31
录用日期:
在线发布日期: 2021-10-21
出版日期: 2022-03-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码