基于Issue检索增强大语言模型的补充性代码注释生成
作者:
中图分类号:

TP311

基金项目:

国家重点研发计划(2023YFB4503803)


Issue-based LLM Retrieval Augmentation for Generating Supplementary Code Comments
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    随着编程命名规范的普及和自描述代码的深入实践, 传统与代码字面相似的摘要性代码注释逐渐失去开发者的青睐. 开发者更关注在理解和维护代码过程中能够提供额外信息的补充性代码注释. 但是, 补充性代码注释的生成往往需要代码之外的额外信息源, 且注释中呈现的补充内容复杂多样, 给现有工作带来很大的挑战. 将软件开发中开发者之间的Issue交流记录作为额外信息源, 提出一种基于Issue检索增强大语言模型的补充性代码注释生成方法. 该方法首先将Issue中的代码补充信息整理分类为5种类型, 再利用大语言模型从代码提交时所关联的Issue中检索出包含潜在类型补充信息的语句, 随后根据相应语句进行注释生成. 进一步, 该方法通过分析生成注释的代码相关性和Issue可验证性, 能较好地过滤生成注释中潜在的幻觉. 在两个主流大语言模型ChatGPT和GPT-4o上进行了实验. 实验结果表明, 所提方法能够将ChatGPT生成注释对于人工补充性注释的覆盖率从33.6%提升至72.2%, 将GPT-4o生成注释对于人工补充性注释的覆盖率从35.8%提升至88.4%, 显著地提升了补充性代码注释的生成效果. 同时, 所提方法所生成的注释相比现有方法能够明显提供更多对开发者有帮助的额外信息, 从而对开发者在理解一些复杂代码时具有十分重要的价值.

    Abstract:

    With the widespread adoption of programming naming conventions and the increasing emphasis on self-explanatory code, traditional summarizing code comments, which are often similar to code literal meaning, are losing appeal among developers. Instead, developers value supplementary code comments that provide additional information beyond the code itself to facilitate program understanding and maintenance. However, generating such comments typically requires external information resources beyond the code base, and the diversity of supplementary information presents significant challenges to existing methods. This study leverages Issue reports as a crucial external information source and proposes an Issue-based retrieval augmentation method using large language models (LLMs) to generate supplementary code comments. The proposed method classifies the supplementary information found in Issue reports into five categories, retrieves Issue sentences containing this information, and generates corresponding comments using LLMs. In addition, the code relevance and Issue verifiability of the generated comments are evaluated to minimize hallucinations. Experiments conducted on two popular LLMs, ChatGPT and GPT-4o, demonstrate the effectiveness of the proposed method. Compared to existing approaches, the proposed method significantly improves the coverage of manual supplementary comments from 33.6% to 72.2% for ChatGPT and from 35.8% to 88.4% for GPT-4o. Moreover, the generated comments offer developers valuable supplementary information, proving essential for understanding some tricky code.

    参考文献
    相似文献
    引证文献
引用本文

潘兴禄,赵衔麟,刘陈晓,邹艳珍,谢冰.基于Issue检索增强大语言模型的补充性代码注释生成.软件学报,,():1-23

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-12
  • 最后修改日期:2024-09-17
  • 在线发布日期: 2025-04-23
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号