Inductive SQL Synthesis with Positive and Negative Tuples
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [27]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    SQL is a programming language that is widely used to operate relation databases. Many users (such as data analysts and junior programmers) will encounter various difficulties when writing SQL query programs due to the lack of programming experience and knowledge of SQL syntax. Currently, the research on the automatic synthesis of SQL query programs from the <input-output> (I/O) example tables has attracted more and more attention. The inductive SQL synthesis with positive and negative tuples (ISST) method proposed in this study can automatically synthesize SQL query programs that meet the users’ expectations by the I/O example tables edited by users and containing a small number of tuples. The ISST method contains five main stages: constructing the SQL query program sketches, expanding the worksheet data, dividing the sets of positive and negative examples, inductively synthesizing selection predicates, and sorting after verifying. The candidate set of SQL query programs is verified on the online database PostgreSQL, and the candidate set of synthesized SQL query programs is scored and sorted according to the principle of Occam’s razor. The ISST method is implemented using the Java language and then is evaluated on a test set containing 28 samples. The results reveal that the ISST method can correctly synthesize 24 of the samples, which takes an average of 2 seconds.

    Reference
    [1] TIOBE. 2022. https://www.tiobe.com/tiobe-index/
    [2] Stack Overflow. 2022. https://stackoverflow.com/
    [3] 刘斌斌, 董威, 王戟. 智能化的程序搜索与构造方法综述. 软件学报, 2018, 29(8): 2180-2197. http://www.jos.org.cn/1000-9825/5529.htm
    Liu BB, Dong W, Wang J. Survey on intelligent search and construction methods of program. Ruan Jian Xue Bao/Journal of Software, 2018, 29(8): 2180-2197 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5529.htm
    [4] Gulwani S. Automating string processing in spreadsheets using input-output examples. In: Proc. of the 38th Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages. Austin: ACM, 2011. 317-330.
    [5] Le V, Gulwani S. FlashExtract: A framework for data extraction by examples. In: Proc. of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation. Edinburgh: ACM, 2014. 542-553.
    [6] Zhang MH, Elmeleegy H, Procopiuc CM, Srivastava D. Reverse engineering complex join queries. In: Proc. of the 2013 ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM, 2013. 809-820.
    [7] Zhang S, Sun YY. Automatically synthesizing SQL queries from input-output examples. In: Proc. of the 28th IEEE/ACM Int’l Conf. on Automated Software Engineering. Silicon Valley: IEEE, 2013. 224-234.
    [8] Tan WC, Zhang MH, Elmeleegy H, Srivastava D. Reverse engineering aggregation queries. Proceedings of the VLDB Endowment, 2017, 10(11): 1394-1405. [doi: 10.14778/3137628.3137648]
    [9] Wang CL, Cheung A, Bodik R. Synthesizing highly expressive SQL queries from input-output examples. ACM SIGPLAN Notices, 2017, 52(6): 452-466. [doi: 10.1145/3140587.3062365]
    [10] Wang CL, Cheung A, Bodik R. Interactive query synthesis from input-output examples. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 1631-1634.
    [11] Cheng L. SqlSol: An accurate SQL query synthesizer. In: Proc. of the 21st Int’l Conf. on Formal Engineering Methods Formal Methods and Software Engineering. Shenzhen: Springer, 2019. 104-120.
    [12] Orvalho P, Terra-Neves M, Ventura M, Martins R, Manquinho M. SQUARES: A SQL synthesizer using query reverse engineering. Proceedings of the VLDB Endowment, 2020, 13(12): 2853-2856. [doi: 10.14778/3415478.3415492]
    [13] Thakkar A, Naik A, Sands N, Alur R, Naik M, Raghothaman M. Example-guided synthesis of relational queries. In: Proc. of the 42nd ACM SIGPLAN Int’l Conf. on Programming Language Design and Implementation. ACM, 2021. 1110-1125.
    [14] Frank E, Witten IH. Generating accurate rule sets without global optimization. In: Proc. of the 15th Int’l Conf. on Machine Learning. Madison: Morgan Kaufmann Publishers Inc., 1998. 144-151.
    [15] De Moura L, Bjørner N. Z3: An efficient SMT solver. In: Proc. of the 14th Int’l Conf. on Tools and Algorithms for the Construction and Analysis of Systems. Budapest: Springer, 2008. 337-340.
    [16] Martins R, Chen J, Chen YJ, Feng Y, Dillig I. Trinity: An extensible synthesis framework for data science. Proceedings of the VLDB Endowment, 2019, 12(12): 1914-1917. [doi: 10.14778/3352063.3352098]
    [17] Tran QT, Chan CY, Parthasarathy S. Query by output. In: Proc. of the 2009 ACM SIGMOD Int’l Conf. on Management of Data. Providence: ACM, 2009. 535-548.
    [18] Tran QT, Chan CY, Parthasarathy S. Query reverse engineering. The VLDB Journal, 2014, 23(5): 721-746. [doi: 10.1007/s00778-013-0349-3]
    [19] Kalashnikov DV, Lakshmanan LVS, Srivastava D. FastQRE: Fast query reverse engineering. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 337-350.
    [20] Bonifati A, Ciucanu R, Staworko S. Learning join queries from user examples. ACM Transactions on Database Systems, 2016, 40(4): 24. [doi: 10.1145/2818637]
    [21] Zloof MM. Query by example. In: Proc. of the 1975 National Computer Conf. and Exposition. Anaheim: ACM, 1975. 431-438.
    [22] Panev K, Michel S. Reverse engineering top-K database queries with PALEO. In: Proc. of the 19th Int’l Conf. on Extending Database Technology. Bordeaux: OpenProceedings.org, 2016. 113-124.
    [23] Shen YY, Chakrabarti K, Chaudhuri S, Ding BL, Novik L. Discovering queries based on example tuples. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 493-504.
    [24] Yaghmazadeh N, Wang YP, Dillig I, Dillig T. SQLizer: Query synthesis from natural language. Proceedings of the ACM on Programming Languages, 2017, 1(OOPSLA): 63. [doi: 10.1145/3133887]
    [25] Yu T, Yasunaga M, Yang K, Zhang R, Wang DX, Li ZF, Radev D. SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018. 1653-1663.
    [26] Ramakrishnan R, Gehrke J. Database Management Systems. 3rd ed., Boston: McGraw-Hill, 2002.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张健,李弋,彭鑫,赵文耘.正反例归纳合成SQL查询程序.软件学报,2023,34(9):4132-4152

Copy
Share
Article Metrics
  • Abstract:703
  • PDF: 2332
  • HTML: 1130
  • Cited by: 0
History
  • Received:August 04,2021
  • Revised:December 10,2021
  • Online: January 04,2023
  • Published: September 06,2023
You are the first2034788Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063