预训练模型在软件工程领域应用研究进展
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(62202223); 江苏省自然科学基金(BK20220881); 高安全系统的软件开发与验证技术工信部重点实验室(南京航空航天大学)开放项目(NJ2022027)


Research Progress of Pre-trained Model in Software Engineering
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来深度学习在软件工程领域任务中取得了优异的性能. 众所周知, 实际任务中优异性能依赖于大规模训练集, 而收集和标记大规模训练集需要耗费大量资源和成本, 这限制了深度学习技术在实际任务中的广泛应用. 随着深度学习领域预训练模型(pre-trained model, PTM)的发布, 将预训练模型引入到软件工程(software engineering, SE)任务中得到了国内外软件工程领域研究人员的广泛关注, 并得到了质的飞跃, 使得智能化软件工程进入了一个新时代. 然而, 目前没有研究提炼预训练模型在软件工程领域的成功和机遇. 为阐明这一交叉领域的工作 (pre-trained models for software engineering, PTM4SE), 系统梳理当前基于预训练模型的智能软件工程相关工作, 首先给出基于预训练模型的智能软件工程方法框架, 其次分析讨论软件工程领域常用的预训练模型技术, 详细介绍使用预训练模型的软件工程领域下游任务, 并比较和分析预训练模型技术这些任务上的性能. 然后详细介绍常用的训练和微调PTM的软件工程领域数据集. 最后, 讨论软件工程领域使用PTM面临的挑战和机遇. 同时将整理的软件工程领域PTM和常用数据集发布在https://github.com/OpenSELab/PTM4SE.

    Abstract:

    In recent years, deep learning has achieved excellent performance in software engineering (SE) tasks. Excellent performance in practical tasks depends on large-scale training sets, and collecting and labeling large-scale training sets require a lot of resources and costs, which limits the wide application of deep learning techniques in practical tasks. With the release of pre-trained model (PTM) in the field of deep learning, researchers in SE have begun to pay attention to PTM and introduced PTM into SE tasks. PTM has made a qualitative leap in SE tasks, which makes intelligent software engineering enter a new era. However, none of the studies have refined the success, failure, and opportunities of pre-trained models in SE. To clarify the work in this cross-field (pre-trained models for software engineering, PTM4SE), this study systematically reviews the current studies related to PTM4SE. Specifically, the study first describes the framework of the intelligent software engineering methods based on pre-trained models and then analyzes the commonly used pre-trained models in SE. Meanwhile, it introduces the downstream tasks in SE with pre-trained models in detail and compares and analyzes the performance of pre-trained model techniques on these tasks. The study then presents the datasets used in SE for training and fine-tuning the PTMs. Finally, it discusses the challenges and opportunities for PTM4SE. The collated PTMs and datasets in SE are published athttps://github.com/OpenSELab/PTM4SE.

    参考文献
    相似文献
    引证文献
引用本文

宫丽娜,周易人,乔羽,姜淑娟,魏明强,黄志球.预训练模型在软件工程领域应用研究进展.软件学报,2025,36(1):1-26

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-02-06
  • 最后修改日期:2023-06-21
  • 录用日期:
  • 在线发布日期: 2024-06-18
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号