代码克隆检测研究进展
作者:
作者单位:

作者简介:

陈秋远(1995-),男,四川成都人,博士生,主要研究领域为智能软件工程,软件仓库挖掘;鄢萌(1989-),男,博士,助理研究员,CCF专业会员,主要研究领域为智能软件工程,软件仓库挖掘,软件维护与演化;李善平(1963-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为分布式计算,软件工程,操作系统内核;夏鑫(1986-),男,博士,讲师,博士生导师,CCF专业会员,主要研究领域为软件仓库挖掘,经验软件工程.

通讯作者:

鄢萌,E-mail:mengy@zju.edu.cn

基金项目:


Code Clone Detection: A Literature Review
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    代码克隆(code clone),是指存在于代码库中两个及以上相同或者相似的源代码片段.代码克隆相关问题是软件工程领域研究的重要课题.代码克隆是软件开发中的常见现象,它能够提高效率,产生一定的正面效益.但是研究表明,代码克隆也会对软件系统的开发、维护产生负面的影响,包括降低软件稳定性,造成代码库冗余和软件缺陷传播等.代码克隆检测技术旨在寻找检测代码克隆的自动化方法,从而用较低成本减少代码克隆的负面效应.研究者们在代码克隆检测方面获得了一系列的检测技术成果,根据这些技术利用源代码信息的程度不同,可以将它们分为基于文本、词汇、语法、语义4个层次.现有的检测技术针对文本相似的克隆取得了有效的检测结果,但同时也面临着更高抽象层次克隆的挑战,亟待更先进的理论、技术来解决.着重从源代码表征方式角度入手,对近年来代码克隆检测研究进展进行了梳理和总结.主要内容包括:(1)根据源代码表征方式阐述并归类了现有的克隆检测方法;(2)总结了模型评估中使用的实验验证方法与性能评估指标;(3)从科学性、实用性和技术难点这3个方面归纳总结了代码克隆研究的关键问题,围绕数据标注、表征方法、模型构建和工程实践4个方面,阐述了问题的可能解决思路和研究的未来发展趋势.

    Abstract:

    Code clone refers to more than two duplicate or similar code fragments existing in a software system. Code clone is a common phenomenon during software development which can facilitate development and has positive impacts on software system. However, research shows that code clone will also do harm to the development and maintenance of software system, including but not limited to the decline of stability, redundancy of source code repository, and propagation of software defects. Code clone is one of the most active research areas in software engineering. Therefore, various detection techniques are proposed to automatically detect code clone in software systems, which help improve software quality. There are a lot of achievements in this area, and these techniques can be categorized to text-based, lexis-based, syntax-based, and semantic-based categories. Current techniques have obtained effective results in text-based clone detection, but still challenges in detecting other types of code clone. More advanced and unified theoretic and technical guidelines are needed to improve code clone detection techniques. Therefore, in this paper, a literature review for code detection is presented especially from the perspective of source code representation. In summary, the contributions of this study are:(1) current code clone detection techniques are concluded and classified from the perspective of code representation; (2) the model validation and performance measures in model evaluation are concluded; and (3) the key issues of code clone research are summarized from three aspects:scientific, practical, and technical difficulties. The possible solutions to the problems and the future development of the research are elaborated, focusing on data annotation, characterization methods, model construction, and engineering practice.

    参考文献
    相似文献
    引证文献
引用本文

陈秋远,李善平,鄢萌,夏鑫.代码克隆检测研究进展.软件学报,2019,30(4):962-980

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2018-08-21
  • 最后修改日期:2018-10-07
  • 录用日期:
  • 在线发布日期: 2019-04-01
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号