基于声明式推理的高效协同查询处理技术

doi:10.13328/j.cnki.jos.007058

微信服务号

微信订阅号

2025年8月13日 23:25 星期三

首页 > 过刊浏览>2024年第35卷第12期 >5558-5581. DOI:10.13328/j.cnki.jos.007058

PDF HTML阅读 XML下载导出引用引用提醒

基于声明式推理的高效协同查询处理技术
DOI:
                        10.13328/j.cnki.jos.007058
                    
CSTR:
                        32375.14.jos.007058
                    
作者:
                        邱志林邱志林
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
寿黎但寿黎但
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
陈珂陈珂
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
江大伟江大伟
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
骆歆远骆歆远
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
陈刚陈刚
浙江大学 计算机科学与技术学院, 浙江 杭州 310027;区块链与数据安全全国重点实验室(浙江大学), 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:邱志林(1997－), 男, 硕士, 主要研究领域为数据库内机器学习的优化. ;寿黎但(1976－), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为非结构化数据管理, 移动社会媒体数据管理, 多媒体挖掘. ;陈珂(1977－), 女, 博士, 副研究员, CCF专业会员, 主要研究领域为非结构化数据管理, 数据挖掘, 隐私保护. ;江大伟(1982－), 男, 博士, 研究员, 博士生导师, 主要研究领域为分布式数据管理技术, 云数据管理技术, 大数据管理技术. ;骆歆远(1988－), 男, 博士, 助理研究员, 主要研究领域为大数据管理, 大数据智能计算, 信息检索. ;陈刚(1973－), 男, 博士, 教授, 博士生导师, CCF杰出会员, 主要研究领域为数据库, 大数据管理系统, 大数据智能计算.
通讯作者:陈珂, E-mail: chenk@zju.edu.cn
中图分类号:TP311
基金项目:国家重点研发计划(2022YFB3304100); 中央高校基本科研业务费专项资金(2021FZZX001-24)

Efficient Collaborative Query Processing Technique Based on Declarative Inference

Author:

QIU Zhi-Lin
QIU Zhi-Lin
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
SHOU Li-Dan
SHOU Li-Dan
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Ke
CHEN Ke
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
JIANG Da-Wei
JIANG Da-Wei
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
LUO Xin-Yuan
LUO Xin-Yuan
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Gang
CHEN Gang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [31]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

由于深度学习领域的不断进步, 人们对用协同查询处理(CQP)技术扩展关系数据库以处理涉及结构化和非结构化数据的高级分析查询越来越感兴趣. 最先进的CQP方法使用用户定义函数(UDFs)来实现深度神经网络(NN)模型来处理非结构化数据, 并使用关系操作来处理结构化数据. 基于UDF的方法简化了查询书写, 允许用户使用单一的SQL提交分析查询, 但要求在即席数据分析中能够根据所需性能指标手动选择合适且高效的模型, 这对用户提出了很高的挑战. 为了解决该问题, 提出基于声明式推理函数(DIF)的协同查询处理技术, 通过优化模型选择、执行方式、设备绑定等多个查询实现路径构建完整的协同查询处理框架. 基于所提研究设计的成本模型和优化规则, 查询处理器能够计算出不同查询计划的代价, 并自动选择最优的物理查询计划. 在4个数据集上的实验结果证实了提出的基于DIF的CQP方法的有效性和效率.

关键词:数据库查询优化;声明式推理函数;协同查询处理;模型选择

Abstract:

Due to the continuous advancements in the field of deep learning, there is growing interest in extending relational databases with collaborative query processing (CQP) techniques to handle advanced analytical queries involving structured and unstructured data. State-of-the-art CQP methods employ user-defined functions (UDFs) to implement deep neural network (NN) models for processing unstructured data while utilizing relational operations for structured data. UDF-based approaches simplify query composition, allowing users to submit analytical queries with a single SQL statement. However, they require manual selection of appropriate and efficient models based on desired performance metrics during ad-hoc data analysis, posing significant challenges to users. To address this issue, this research proposes a CQP technique based on declarative inference functions (DIF), which constructs a complete CQP framework by optimizing model selection, execution strategies, and device bindings across multiple query execution paths. Leveraging the cost model and optimization rules designed in this study, the query processor is capable of estimating the cost of different query plans and automatically selecting the optimal physical query plan. Experimental results on four datasets validate the effectiveness and efficiency of the proposed DIF-based CQP approach.

Key words:database query optimization;declarative inference function (DIF);collaborative query processing (CQP);model selection

参考文献

[1] Lin QR, Wu S, Zhao JB, Dai J, Li FF, Chen G. A comparative study of in-database inference approaches. In: Proc. of the 38th IEEE Int’l Conf. on Data Engineering. Kuala Lumpur: IEEE, 2022. 1794–1807.

[2] Lu Y, Chowdhery A, Kandula S, Chaudhuri S. Accelerating machine learning inference with probabilistic predicates. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 1493–1508.

[3] 李国良, 周煊赫, 孙佶, 余翔, 袁海涛, 刘佳斌, 韩越. 基于机器学习的数据库技术综述. 计算机学报, 2020, 43(11): 2019–2049.

Li GL, Zhou XH, Sun J, Yu X, Yuan HT, Liu JB, Han Y. A survey of machine learning based database techniques. Chinese Journal of Computers, 2020, 43(11): 2019–2049 (in Chinese with English abstract).

[4] 李国良, 周煊赫. 面向AI的数据管理技术综述. 软件学报, 2021, 32(1): 21–40. http://www.jos.org.cn/1000-9825/6121.htm

Li GL, Zhou XH. Survey of data management techniques for artificial intelligence. Ruan Jian Xue Bao/Journal of Software, 2021, 32(1): 21–40 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6121.htm

[5] 孙路明, 张少敏, 姬涛, 李翠平, 陈红. 人工智能赋能的数据管理技术研究. 软件学报, 2020, 31(3): 600–619. http://www.jos.org.cn/1000-9825/5909.htm

Sun LM, Zhang SM, Ji T, Li CP, Chen H. Survey of data management techniques powered by artificial intelligence. Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 600–619 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5909.htm

[6] 柴茗珂, 范举, 杜小勇. 学习式数据库系统: 挑战与机遇. 软件学报, 2020, 31(3): 806–830. http://www.jos.org.cn/1000-9825/5908.htm

Chai MK, Fan J, Du XY. Learnable database systems: Challenges and opportunities. Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 806–830 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5908.htm

[7] 邱涛, 王斌, 舒昭维, 赵智博, 宋子文, 钟延辉. 面向关系数据库的智能索引调优方法. 软件学报, 2020, 31(3): 634–647. http://www.jos.org.cn/1000-9825/5906.htm

Qiu T, Wang B, Shu ZW, Zhao ZB, Song ZW, Zhong YH. Intelligent index tuning approach for relational databases. Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 634–647 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5906.htm

[8] 李国良, 周煊赫. 轩辕: AI原生数据库系统. 软件学报, 2020, 31(3): 831–844. http://www.jos.org.cn/1000-9825/5899.htm

Li GL, Zhou XH. XuanYuan: An AI-native database systems. Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 831–844 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5899.htm

[9] Oracle. Oracle advanced analytics. 2012. https://www.oracle.com/artificial-intelligence/database-machine-learning/features/

[10] Microsoft. Microsoft SQL MLS. 2017. https://learn.microsoft.com/en-us/sql/machine-learning/?view=sql-server-2017

[11] Hellerstein JM, Ré C, Schoppmann F, Wang DZ, Fratkin E, Gorajek A, Ng KS, Welton C, Feng XX, Li K, Kumar A. The MADlib analytics library: Or MAD skills, the SQL. Proc. of the VLDB Endowment, 2012, 5(12): 1700–1711.

[12] D’Silva JV, de Moor F, Kemme B. AIDA: Abstraction for advanced in-database analytics. Proc. of the VLDB Endowment, 2018, 11(11): 1400–1413.

[13] Li XP, Cui B, Chen YR, Wu WT, Zhang C. MLog: Towards declarative in-database machine learning. Proc. of the VLDB Endowment, 2017, 10(12): 1933–1936.

[14] Luo SY, Gao ZJ, Gubanov M, Perez LL, Jermaine C. Scalable linear algebra on a relational database system. IEEE Trans. on Knowledge and Data Engineering, 2019, 31(7): 1224–1238.

[15] Schüle ME, Simonis F, Heyenbrock T, Kemper A, Günnemann S, Neumann T. In-database machine learning: Gradient descent and tensor algebra for main memory database systems. In: Proc. of Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs, Datenbanken und Informationssysteme. Rostock: Gesellschaft für Informatik, 2019. 247–266.

[16] Günther M, Thiele M, Lehner W. RETRO: Relation retrofitting for in-database machine learning on textual data. In: Proc. of the 23rd Int’l Conf. on Extending Database Technology. Copenhagen: OpenProceedings.org, 2020. 411–414.

[17] Kang DL, Mathur A, Veeramacheneni T, Bailis P, Zaharia M. Jointly optimizing preprocessing and inference for DNN-based visual analytics. Proc. of the VLDB Endowment, 2020, 14(2): 87–100.

[18] 钮泽平, 李国良. 数据库内AI模型优化. 软件学报, 2021, 32(3): 622–635. http://www.jos.org.cn/1000-9825/6179.htm

Niu ZP, Li GL. In-database AI model optimization. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3): 622–635 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6179.htm

[19] Kang DL, Emmons J, Abuzaid F, Bailis P, Zaharia M. NoScope: Optimizing neural network queries over video at scale. Proc. of the VLDB Endowment, 2017, 10(11): 1586–1597.

[20] Yang ZH, Wang ZZ, Huang YC, Lu Y, Li C, Wang XS. Optimizing machine learning inference queries with correlative proxy models. Proc. of the VLDB Endowment, 2022, 15(10): 2032–2044.

[21] Kang DL, Guibas J, Bailis P, Hashimoto T, Zaharia M. Task-agnostic indexes for deep learning-based queries over unstructured data. arXiv:2009.04540, 2020.

[22] Li JY, Sun MS, Zhang X. A comparison and semi-quantitative analysis of words and character-bigrams as features in Chinese text categorization. In: Proc. of the 21st Int’l Conf. on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Sydney: Association for Computational Linguistics, 2006. 545–552.

[23] Xu L, Tong Y, Dong QQ, Liao YX, Yu C, Tian Y, Liu WT, Li L, Liu CQ, Zhang XW. CLUENER2020: Fine-grained named entity recognition dataset and benchmark for Chinese. arXiv:2001.04351. 2020.

[24] Krizhevsky A. Learning multiple layers of features from tiny images [MS. Thesis]. Toronto: University of Toronto, 2009.

引用本文

邱志林,寿黎但,陈珂,江大伟,骆歆远,陈刚.基于声明式推理的高效协同查询处理技术.软件学报,2024,35(12):5558-5581

复制

文章指标

点击次数:613
下载次数: 2394
HTML阅读次数: 645
引用次数: 0

历史

收稿日期:2023-04-12
最后修改日期:2023-06-05
录用日期:
在线发布日期: 2024-01-17
出版日期: 2024-12-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码