LLM赋能的Datalog代码翻译技术及增量程序分析框架
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

卜磊,E-mail:bulei@nju.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(62232008,62172200);江苏省前沿引领技术基础研究专项(BK20202001)


LLM-powered Datalog Code Translation and Incremental Program Analysis Framework
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    Datalog是一种声明式逻辑编程语言,在不同领域得到了广泛应用.近年来,学术界和工业界对Datalog的兴趣高涨,设计并开发了多种Datalog引擎和相应方言.然而,多方言带来的一个问题是以一种Datalog方言实现的代码一般而言不能在另一种方言的引擎上执行.因此,当采用新Datalog引擎时,需要将现有Datalog代码翻译到新方言上.目前的Datalog代码翻译技术可分为人工重写代码和人工设计翻译规则两类,存在耗时长、大量重复劳动、缺乏灵活性和可拓展性等问题.本文提出了一种大语言模型(LLM)赋能的Datalog代码翻译技术,利用LLM强大的代码理解和生成能力,通过分治翻译策略、基于少样本提示和思维链提示的提示工程、基于检查-反馈-修复的迭代纠错机制,可以在不同Datalog方言之间实现高精度代码翻译,减轻开发人员重复开发翻译规则的工作量.基于此代码翻译技术,设计并实现了一种通用的基于Datalog的声明式增量程序分析框架.在不同Datalog方言对上评估了所提出的LLM赋能的Datalog代码翻译技术的性能,评估结果验证了所提代码翻译技术的有效性.我们也在对通用声明式增量程序分析框架进行了实验评估,验证了基于所提代码翻译技术的增量程序分析的加速效果.

    Abstract:

    Datalog, a declarative logic programming language, has been widely-adopted across diverse domains and experienced a surge in interest from both academia and industry in recent years. This renewed attention has led to the design and development of various Datalog engines alongside their respective dialects. However, a prevalent challenge is that code implemented in one Datalog dialect typically cannot be executed on the engine of another dialect. This limitation necessitates the translation of existing Datalog codebases when transitioning to a new Datalog engine. Traditional approaches to Datalog code translation, which include manual code rewriting and the creation of translation rules, are often time-consuming, repetitive, inflexible, and not easily scalable. This paper proposes an LLM-powered Datalog code translation technology, utilizing the powerful code understanding and generation capabilities of LLM, through a divide-and-conquer strategy, prompt engineering based on few-shot and CoT prompts, and an iterative error-correction mechanism based on check-feedback-repair, which can achieve high-precision code translation between different Datalog dialects and reduce the workload of developers in developing translation rules repeatedly. Building on this code translation technology, a general declarative incremental program analysis framework based on Datalog has been designed and implemented. The performance of the proposed LLM-powered Datalog code translation technology was evaluated on different Datalog dialects, and the evaluation results verified the effectiveness of the proposed code translation technology. We also conducted experimental evaluation on the general declarative incremental program analysis framework, verifying the speedup effect of incremental program analysis based on the proposed code translation technology.

    参考文献
    相似文献
    引证文献
引用本文

王熙灶,沈天琪,宾向荣,卜磊. LLM赋能的Datalog代码翻译技术及增量程序分析框架.软件学报,2025,36(6):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-10-14
  • 录用日期:
  • 在线发布日期: 2024-12-10
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号