代码变更表示学习及其应用研究进展
作者:
作者单位:

作者简介:

刘忠鑫(1994-),男,博士,特聘研究员,CCF专业会员,主要研究领域为智能软件工程,软件仓库挖掘.;唐郅杰(1999-),男,硕士生,主要研究领域为智能软件工程.;夏鑫(1986-),男,博士,CCF专业会员,主要研究领域为软件仓库挖掘,经验软件工程.;李善平(1963-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为分布式计算,软件工程,Linux内核.

通讯作者:

夏鑫,E-mail:xin.xia@acm.org

中图分类号:

基金项目:

浙江大学教育基金会启真人才基金


Research Progress of Code Change Representation Learning and Its Application
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    代码变更是软件演化过程中的关键行为, 其质量与软件质量密切相关. 对代码变更进行建模和表示是众多软件工程任务的基础, 例如即时缺陷预测、软件制品可追溯性恢复等. 近年来, 代码变更表示学习技术得到了广泛的关注与应用. 该类技术旨在学习将代码变更的语义信息表示为稠密低维实值向量, 即学习代码变更的分布式表示, 相比于传统的人工设计代码变更特征的方法具有自动学习、端到端训练和表示准确等优点. 但同时该领域目前也存在如结构信息利用困难、基准数据集缺失等挑战. 对近期代码变更表示学习技术的研究及应用进展进行了梳理和总结, 主要内容包括: (1)介绍了代码变更表示学习及其应用的一般框架. (2)梳理了现有的代码变更表示学习技术, 总结了不同技术的优缺点. (3)总结并归类了代码变更表示学习技术的下游应用. (4)归纳了代码变更表示学习技术现存的挑战和潜在的机遇, 展望了该类技术的未来发展方向.

    Abstract:

    Code change is a kind of key behavior in software evolution, and its quality has a large impact on software quality. Modeling and representing code changes is the basis of many software engineering tasks, such as just-in-time defect prediction and recovery of software product traceability. The representation learning technologies for code changes have attracted extensive attention and have been applied to diverse applications in recent years. This type of technology targets at learning to represent the semantic information in code changes as low-dimensional dense real-valued vectors, namely, learning the distributed representation of code changes. Compared with the conventional methods of manually designing code change features, such technologies offers the advantages of automatic learning, end-to-end training, and accurate representation. However, this field is still faced with some challenges, such as great difficulties in utilizing structural information and the absence of benchmark datasets. This study surveys and summarizes the recent progress of studies and applications of representation learning technologies for code changes, and it mainly consists of the following four parts. (1) The study presents the general framework of representation learning of code changes and its application. (2) Subsequently, it reviews the currently available representation learning technologies for code changes and summarizes their respective advantages and disadvantages. (3) Then, the downstream applications of such technologies are summarized and classified. (4) Finally, this study discusses the challenges and potential opportunities ahead of representation learning technologies for code changes and suggests the directions for the future development of this type of technology.

    参考文献
    相似文献
    引证文献
引用本文

刘忠鑫,唐郅杰,夏鑫,李善平.代码变更表示学习及其应用研究进展.软件学报,2023,34(12):5501-5526

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-12-23
  • 最后修改日期:2022-04-21
  • 录用日期:
  • 在线发布日期: 2022-10-26
  • 出版日期: 2023-12-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号