基于大语言模型的长方法分解
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

姜艳杰,E-mail:yanjiejiang@pku.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(62172037,62232003);中国博士后科学基金第74批面上资助(2023M740078)


Large Language Model-Based Decomposition of Long Methods
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    长方法及其他类型的代码坏味阻碍了软件应用程序达到最佳的可读性、可重用性和可维护性。因此,人们对长方法的自动检测和分解进行了广泛的研究。虽然这些方法极大地促进了分解,但其解决方案往往与最优方案存在很大差异。为此,我们调研公开真实长方法数据集中的可自动化部分,探讨了长方法的分解情况,并基于调研结果,在本文中提出了一种基于大语言模型的新方法(称为 Lsplitter),用于自动分解长方法。对于给定的长方法,Lsplitter会根据启发式规则和大语言模型将该方法分解为一系列短方法。然而,大语言模型经常会拆分出相似的方法,针对大语言模型的分解结果,Lsplitter利用基于位置的算法,将物理上连续且高度相似的方法合并成一个较长的方法。最后对这些候选结果进行排序。我们对真实Java项目中的2849个长方法进行了实验。实验结果表明,相较传统结合模块化矩阵的方法,Lsplitter的命中率提升了142%,相较纯基于大语言模型的方法,命中率提升了7.6%。

    Abstract:

    Long methods, as well as other categories of code smells, are preventing software applications from reaching their maximal readability, reusability, and maintainability. Consequently, automated detection and decomposition of long methods have been extensively studied. Although such approaches have significantly facilitated the decomposition, their solutions are often substantially different from the optimal ones. To this end, in this paper, we investigated the automatable portion of a publicly available dataset containing real-world long methods. Based on the findings from this investigation, we propose a method called Lsplitter, which utilizes large language models to automatically decompose long methods. For a given long method, Lsplitter employs heuristic rules and large language models to decompose the method into a series of shorter methods. However, large language models often result in the decomposition of similar methods. To address this, Lsplitter uses a location-based algorithm to merge physically contiguous and highly similar methods into a longer method. Finally, it ranks these candidate results. We conducted experiments on 2849 long methods from real-world Java projects. The experimental results show that Lsplitter improves the hit rate by 142% compared to traditional methods combined with modularity matrix, and by 7.6% compared to methods purely based on large language models.

    参考文献
    相似文献
    引证文献
引用本文

徐子懋,姜艳杰,张宇霞,刘辉.基于大语言模型的长方法分解.软件学报,2025,36(6):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-10-14
  • 录用日期:
  • 在线发布日期: 2024-12-10
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号