基于代码克隆差异分析的函数模板挖掘和检索方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(62172099)


Function Template Mining and Retrieval Based on Code Clone Difference Analysis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在软件工程领域, 代码库承载着丰富的知识资源, 可以为开发者提供编程实践的案例参考. 源代码中频繁出现的模式化重复片段, 若能以代码模板的形式有效提炼, 便能显著提升编程效率. 当前实践中, 开发者常常通过源代码搜索复用现有解决方案, 然而此方法往往产生大量相似且冗余的结果, 增加了后续筛选工作的负担. 与此同时, 以克隆代码为基础的模板挖掘技术, 往往未能涵盖由分散小克隆片段构成的广泛模式, 进而限制了模板的实用性. 我们提出了一种基于代码克隆检测的代码模板提取和检索的新方法, 通过拼接多个片段级克隆以及提取和聚合方法级克隆的共享部分, 实现了更高效的函数级代码模板提取, 并解决了模板质量问题. 基于所挖掘的代码模板, 提出了一种代码结构特征的三元组表示法, 有效地对纯文本特征进行补充, 并实现了高效和简洁的结构表示. 此外, 我们也提出了一种结构和文本检索相结合的模板特征检索方法, 以便通过匹配编程上下文的特征来检索这些模板. 基于本方法实现的工具CodeSculptor, 在包含45个高质量Java开源项目的代码库测试中展现了其提取高质量代码模板的显著能力. 结果表明该工具挖掘的模板平均可实现60.87%的代码量减少, 且有92.09%是通过拼接片段级克隆产生的, 这一比例的模板在传统的方法是无法识别出的, 这印证了该方法在识别和构建代码模板方面的卓越性能. 在代码模板检索和推荐的实验中, Top-5检索结果精确度达到了96.87%. 我们还通过对随机选择的9600个模板进行的初步案例研究, 讨论了模板的实用性, 并发现大多数抽样代码模板在语义上是完整的, 少数无意义的模板表明我们的模板提取工作的未来潜力. 我们的用户研究进一步表明, 使用CodeSculptor能够更有效率完成代码开发任务.

    Abstract:

    In the field of software engineering, code repositories contain a wealth of knowledge resources, which can provide developers with examples of programming practices. If repetitive patterns, frequently occurring in source code, can be effectively extracted in the form of code templates, programming efficiency could be significantly improved. In current practice, developers often reuse existing solutions by searching through source code. However, this method typically generates a large number of similar and redundant results, increasing the burden of subsequent filtering. Moreover, template mining techniques based on cloned code often fail to cover extensive patterns constructed from dispersed small clones, thereby limiting the practicality of the templates. A new method is proposed for extracting and retrieving code templates based on code clone detection. This method achieves more efficient function-level code template extraction by stitching together multiple fragment-level clones and extracting and aggregating the shared parts of method-level clones and addresses the issue of template quality. Based on the mined code templates, this study comes up with a triplet representation method for code structural features that effectively supplements plain text features, and implements an efficient and concise structural representation. In addition, this study presents a template feature retrieval method that combines structural and textual search to retrieve these templates by matching features of the programming context. The tool implemented based on this method, CodeSculptor, demonstrates its significant capability to extract high-quality code templates in a test against a codebase containing 45 high-quality Java open-source projects. The results show that the templates mined by the tool achieve an average code reduction of 60.87%, with 92.09% produced by stitching fragment-level clones, a proportion of templates that is not identifiable by traditional method., It proves the superior performance of the method in recognizing and constructing code templates. Furthermore, the accuracy of the top-5 search results in our code template search and recommendation is 96.87%. A preliminary case study on 9600 randomly selected templates reveals that most of the sampled code templates are complete and coherent in semantics, thus affirming their practicality. Nonetheless, there are a few meaningless templates, highlighting the future potential to refine the proposed template extraction strategy. The user research further shows that code development tasks can be done more efficiently with CodeSculptor.

    参考文献
    相似文献
    引证文献
引用本文

肖泉彬,陈源,吴毅坚,彭鑫.基于代码克隆差异分析的函数模板挖掘和检索方法.软件学报,,():1-22

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-26
  • 最后修改日期:2024-04-07
  • 录用日期:
  • 在线发布日期: 2024-07-03
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号