随着新兴技术的迅速发展, 领域软件对开发效率提出了新的要求.Datalog语言作为一门具有简洁语法和良好语义的声明式编程语言, 能帮助开发人员快速开发和解决问题, 近年来越来越受到重视与欢迎.但解决真实场景问题时, 现有的单机Datalog引擎计算规模往往受限于内存容量大小, 不具有可扩展性.为解决上述问题, 本文设计并实现了基于核外计算的Datalog引擎.方法首先设计了一系列计算Datalog程序所需的支持核外计算的操作算子, 然后将Datalog程序转换合成带核外计算算子的C++程序, 接着方法设计了基于Hash的分区策略和基于搜索树剪枝的最少置换调度策略, 将相应的分区文件调度执行计算并得到最终结果.基于该方法, 实现了原型工具DDL(Disk-Based DataLog Engine), 并选取广泛应用的真实Datalog程序, 在合成数据集以及真实数据集上进行实验, 实验结果体现了DDL良好性能以及高可扩展性.
With the rapid development of emerging technologies, domain software puts forward the new requirements on development efficiency. Datalog as a declarative programming language with concise syntax and good semantics, can help developers to reason and solve complex problems rapidly. However, when solving the real-world problems, the existing single-machine Datalog engines are often limited by the size of memory capacity and have no scalability. In order to solve the above problems, this paper designs and implements Datalog engine based on out-of-core computation. Methods firstly, a series of out-of-core operators are designed, and then the Datalog program is converted into the C++ program with the operators. Then, the partition strategy based on Hash and the minimum replacement scheduling strategy based on search tree pruning are designed. The corresponding partition files are scheduled and computed, and then the final results are generated. Based on this method, the prototype tool DDL(Disk-Based DataLog Engine) is implemented, and widely used real-world Datalog programs are selected to conduct experiments on both synthetic and real-world datasets. The experimental results show that DDL has good performance and high scalability.