[关键词]
[摘要]
大数据时代下,数据库系统主要面临3个方面的挑战:首先,基于专家经验的传统优化技术(如代价估计、连接顺序选择、参数调优)已经不能满足异构数据、海量应用和大规模用户对性能的需求,可以设计基于学习的数据库优化技术,使数据库更智能;其次,AI时代,很多数据库应用需要使用人工智能算法,如数据库中的图像搜索,可以将人工智能算法嵌入到数据库,利用数据库技术加速人工智能算法,并在数据库中提供基于人工智能的服务;再者,传统数据库侧重于使用通用硬件(如CPU),不能充分发挥新硬件(如ARM、AI芯片)的优势.此外,除了关系模型,数据库需要支持张量模型来加速人工智能操作.为了解决这些挑战,提出了原生支持人工智能(AI)的数据库系统,将各种人工智能技术集成到数据库中,以提供自监控、自配置、自优化、自诊断、自愈、自安全和自组装功能;另一方面,通过使用声明性语言,让数据库提供人工智能功能,以降低人工智能的使用门槛.介绍了实现人工智能原生数据库的5个阶段,并给出了设计人工智能原生数据库的挑战.以自主数据库调优、基于深度强化学习的查询优化、基于机器学习的基数估计和自主索引/视图推荐为例,展示了人工智能原生数据库的优势.
[Key word]
[Abstract]
In big data era, database systems face three challenges. Firstly, the traditional empirical optimization techniques (e.g., cost estimation, join order selection, knob tuning) cannot meet the high-performance requirement for large-scale data, various applications and diversified users. It is needed to design learning-based techniques to make database more intelligent. Secondly, many database applications require to use AI algorithms, e.g., image search in database. AI algorithms can be embedded into database, utilizing database techniques to accelerate AI algorithms, and providing AI capability inside databases. Thirdly, traditional databases focus on using general hardware (e.g., CPU), but cannot fully utilize new hardware (e.g., ARM, GPU, AI chips). Moreover, besides relational model, tensor model can be utilized to accelerate AI operations. Thus, it is needed to design new techniques to make full use of new hardware. To address these challenges, an AI-native database is designed. On one hand, AI techniques are integrated into databases to provide self-configuring, self-optimizing, self-monitoring, self-diagnosis, self-healing, self-assembling, and self-security capabilities. On the other hand, databases are enabled to provide AI capabilities using declarative languages in order to lower the barrier of using AI. This study introduce five levels of AI-native databases and provide several open challenges of designing an AI-native database. Autonomous database knob tuning, deep reinforcement learning based optimizer, machine-learning based cardinality estimation, and autonomous index/view advisor are also taken as examples to showcase the superiority of AI-native databases.
[中图分类号]
[基金项目]
国家自然科学基金(61925205,61632016,61521002);国家重点研发计划(2015CB358700)