代码变更是软件演化过程中的关键行为, 其质量与软件质量密切相关. 对代码变更进行建模和表示是众多软件工程任务的基础, 例如即时缺陷预测、软件制品可追溯性恢复等. 近年来, 代码变更表示学习技术得到了广泛的关注与应用. 该类技术旨在学习将代码变更的语义信息表示为稠密低维实值向量, 即学习代码变更的分布式表示, 相比于传统的人工设计代码变更特征的方法具有自动学习、端到端训练和表示准确等优点. 但同时该领域目前也存在如结构信息利用困难、基准数据集缺失等挑战. 对近期代码变更表示学习技术的研究及应用进展进行了梳理和总结, 主要内容包括: (1)介绍了代码变更表示学习及其应用的一般框架. (2)梳理了现有的代码变更表示学习技术, 总结了不同技术的优缺点. (3)总结并归类了代码变更表示学习技术的下游应用. (4)归纳了代码变更表示学习技术现存的挑战和潜在的机遇, 展望了该类技术的未来发展方向.
Code change is a kind of key behavior in software evolution, and its quality has a large impact on software quality. Modeling and representing code changes is the basis of many software engineering tasks, such as just-in-time defect prediction and recovery of software product traceability. The representation learning technologies for code changes have attracted extensive attention and have been applied to diverse applications in recent years. This type of technology targets at learning to represent the semantic information in code changes as low-dimensional dense real-valued vectors, namely, learning the distributed representation of code changes. Compared with the conventional methods of manually designing code change features, such technologies offers the advantages of automatic learning, end-to-end training, and accurate representation. However, this field is still faced with some challenges, such as great difficulties in utilizing structural information and the absence of benchmark datasets. This study surveys and summarizes the recent progress of studies and applications of representation learning technologies for code changes, and it mainly consists of the following four parts. (1) The study presents the general framework of representation learning of code changes and its application. (2) Subsequently, it reviews the currently available representation learning technologies for code changes and summarizes their respective advantages and disadvantages. (3) Then, the downstream applications of such technologies are summarized and classified. (4) Finally, this study discusses the challenges and potential opportunities ahead of representation learning technologies for code changes and suggests the directions for the future development of this type of technology.