Cardinality and Cost Estimator Based on Tree Gated Recurrent Unit
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [34]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Cardinality estimation and cost estimation can guide the selection of execution plan, and cardinality accuracy is very important for query optimizer. However, the cost and cardinality estimation techniques in traditional databases cannot provide accurate estimations because they do not consider the correlation across multiple tables. Recently, the application of artificial intelligence technology to databases (artificial intelligence for databases, AI4DB) has attracted wide attention, and the results show that the learning-based estimation method is superior to the traditional methods. However, the existing learning-based methods have some drawbacks. Firstly, most methods can only estimate the cardinality, but ignore the cost estimation. Secondly, these methods can only deal with some simple query statements, while do not work for complex queries, such as multi-table query and nested query. At the same time, it is also difficult for them to deal with string type of values. In order to solve the above problems, a novel method of estimating the cardinality and cost based on Tree-GRU (tree-gated recurrent unit), which can estimate the cardinality as well as the cost. In addition, an effective feature extraction and coding technology is applied, both query and execution plan are considered in feature extraction. These features are embedded into Tree-GRU. For the value of string type, the neural network is used to automatically extract the relationship between the substring and the whole string, embedding the string, so that the sparse string can be easily processed by the estimator. Extensive experiments were conducted on the JOB and Synthetic datasets and the results show that the performance of the proposed model outperforms the state-of-the-art algorithms in all aspects.

    Reference
    [1] Leis V, Gubichev A, Mirchev A, Boncz PA, Kemper A, Neumann T. How good are query optimizers, really? Proc. of the VLDB Endowment, 2015, 9(3):204-215.[doi:10.14778/2850583.2850594]
    [2] Leis V, Radke B, Gubichev A, Mirchev A, Boncz PA, Kemper A, Neumann T. Query optimization through the looking glass, and what we found running the join order benchmark. Proc. of the VLDB Endowment, 2018, 27(5):643-668.[doi:10.14778/2850583. 2850594]
    [3] Kipf A, Kipf T, Radke B, Leis V, Boncz PA, Kemper A. Learned cardinalities:Estimating correlated joins with deep learning. In: Proc. of the 9th Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2019. 1-8.
    [4] Sun J, Li GL. An end-to-end learning-based cost estimator. Proc. of the VLDB Endowment, 2019, 13(3):307-319.[doi:10.14778/3368289.3368296]
    [5] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In:Proc. of the 1st Int'l Conf. on Learning Representations. New York:ACM, 2013. 1-9.
    [6] Dai YF, Guo WZ, Chen X, Zhang ZW. Relation classification via LSTMs based on sequence and tree structure. IEEE Access, 2018, 6:64927-64937.[doi:10.1109/ACCESS.2018.2877934]
    [7] Ioannidis YE. The history of histograms (abridged). In:Proc. of 29th Int'l Conf. on Very Large Data Bases. Berlin:Morgan Kaufmann, 2003. 19-30.[doi:10.1016/B978-012722442-8/50011-2]
    [8] Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31. 2):182-209.[doi:10.1016/0022-0000(85)90041-8]
    [9] Wu WT, Naughton JF, Singh H. Sampling-based query re-optimization. In:Proc. of the 2016 Int'l Conf. on Management of Data. San Francisco, New York:ACM, 2016. 1721-1736.[doi:10.1145/2882903.2882914]
    [10] Malik T, Burns RC, Chawla NV. A black-box approach to query cardinality estimation. In:Proc. of the 3rd Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2007. 56-67.
    [11] Ortiz J, Balazinska M, Gehrke J, Keerthi SS. Learning state representations for query optimization with deep reinforcement learning. In:Proc. of the 2nd Workshop on Data Management for End-to-end Machine Learning. New York:ACM, 2018.[doi: 10.1145/3209889.3209890]
    [12] Yang ZH, Liang E, Kamsetty A, Wu CG, Duan Y, Chen X, Abbeel P, Hellerstein JM, Krishnan S, Stoica I. Deep unsupervised cardinality estimation. Proc. of the VLDB Endowment, 2019, 13(3):279-292.[doi:10.14778/3368289.3368294]
    [13] Marcus R, Papaemmanouil O. Plan-structured deep neural network models for query performance prediction. Proc. of the VLDB Endowment, 2019, 12(11):1733-1746.[doi:10.14778/3342263.3342646]
    [14] Leis V, Radke B, Gubichev A, Kemper A, Neumann T. Cardinality estimation done right:Index-based join sampling. In:Proc. of the 8th Biennial Conf. on Innovative Data Systems Research. New York:ACM, 2017. 1-8.
    [15] Fan J, Li GL, Zhou LZ. Interactive SQL query suggestion:Making databases user-friendly. In:Proc. of the 27th Int'l Conf. on Data Engineering. Washington:IEEE, 2011. 351-362.[doi:10.1109/ICDE.2011.5767843]
    [16] Li GL, Chai CL, Fan J, Weng XP, Li J, Zheng YD, Li YB, Yu X, Zhang XH, Yuan HT. CDB:Optimizing queries with crowdbased selections and joins. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. New York:ACM, 2017. 1463-1478. [doi:10.1145/3035918.3064036]
    [17] Li GL, Zhou XH. XuanYuan:An AI-native database systems. Ruan Jian Xue Bao/Journal of Software, 2020, 31. 3):831-844(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5899.htm[doi:10.13328/j.cnki.jos.005899]
    [18] Yu X, Li GL, Chai CL, Tang N. Reinforcement learning with Tree-LSTM for join order selection. In:Proc. of the 36th IEEE Int'l Conf. on Data Engineering. Washington:IEEE, 2020. 1297-1308.[doi:10.1109/ICDE48307.2020.00116]
    [19] Zhang J, Liu Y, Zhou K, Li GL. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In:Proc. of the 2019 Int'l Conf. on Management of Data. New York:ACM, 2019. 415-432.[doi:10.1145/3299869.3300085]
    [20] Li GL, Zhou XH, Li SF, Gao B. QTune:A query-aware database tuning system with deep reinforcement learning. Proc. of the VLDB Endowment, 2019, 12(12):2118-2130.[doi:10.14778/3352063.3352129]
    [21] Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In:Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics. Valencia:Association for Computational Linguistics, 2017. 427-431.[doi:10.18653/v1/e17-2068]
    [22] Wolf A, Barbosa CRH, Costa E. Multiple MLP neural networks applied on the determination of segment limits in ECG signals. In: Proc. of the 7th Int'l Work-Conf. on Artificial and Natural Neural Networks, Vol.2687. Berlin:Springer-Verlag, 2003. 607-614. [doi:10.1007/3-540-44869-1_77]
    [23] Liu B, Liang Y. Optimal function approximation with ReLU neural networks. Neurocomputing, 2021, 435:216-227.[doi:10.1016/j.neucom.2021.01.007]
    [24] Graves A, Mohamed A, Hinton GE. Speech recognition with deep recurrent neural networks. In:Proc. of 2013 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. Washington:IEEE, 2013. 6645-6649.[doi:10.1109/ICASSP.2013.6638947]
    [25] Wang YZ, Li SJ, Yang JF, Sun X, Wang HF. Tag-enhanced tree-structured neural networks for implicit discourse relation classification. In:Proc. of the the 8th Int'l Joint Conf. on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. 496-505.
    [26] Pennington J, Socher R, Manning CD. Glove:Global vectors for word representation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing. 2014. 1532-1543.[doi:10.3115/v1/d14-1162]
    [27] Capuano EA. A TCO analysis of cluster servers based on TPC-H benchmarks. In:Proc. of the 2005 IEEE Int'l Conf. on Cluster Computing. Washington:IEEE, 2005. 1-10.[doi:10.1109/CLUSTR.2005.347080]
    [28] Mudigere D, Naumov M, Spisak J, Chauhan G, Kokhlikyan N, Singh A, Goswami V. Building recommender systems with PyTorch. In:Proc. of the 26th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. New York:ACM, 2020. 3525-3526.[doi: 10.1145/3394486.3406714]
    [29] AlMouhamed MA, Khan AH, Mohammad N. A review of CUDA optimization techniques and tools for structured grid computing. Computing, 2020, 102(4):977-1003.[doi:10.1007/s00607-019-00744-1]
    [30] Kingma DP, Ba J. Adam:A method for stochastic optimization. In:Proc. of the 3rd Int'l Conf. on Learning Representations. New York:ACM, 2015. 1-15.
    [31] Shi NC, Li DW, Hong MY, Sun RY. RMSprop converges with proper hyper-parameter. In:Proc. of the 9th Int'l Conf. on Learning Representations. New York:ACM, 2021. 1-10.
    [32] Duchi JC, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12:2121-2159.[doi:10.1109/TNN.2011.2146788]
    附中文参考文献:
    [17] 李国良, 周煊赫. 轩辕:AI原生数据库系统. 软件学报, 2020, 31. 3):831-844. http://www.jos.org.cn/1000-9825/5899.htm[doi: 10.13328/j.cnki.jos.005899]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

乔少杰,杨国平,韩楠,屈露露,陈浩,毛睿,元昌安,Louis Alberto GUTIERREZ.基于树型门控循环单元的基数和代价估计器.软件学报,2022,33(3):797-813

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 30,2021
  • Revised:July 31,2021
  • Online: October 21,2021
  • Published: March 06,2022
You are the first2035064Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063