代码变更表示学习及其应用研究进展

doi:10.13328/j.cnki.jos.006749

微信服务号

微信订阅号

2025年4月1日 18:32 星期二

首页 > 过刊浏览>2023年第34卷第12期 >5501-5526. DOI:10.13328/j.cnki.jos.006749

PDF HTML阅读 XML下载导出引用引用提醒

代码变更表示学习及其应用研究进展
DOI:
                        10.13328/j.cnki.jos.006749
                    
CSTR:
                        
                    
作者:
                        刘忠鑫刘忠鑫
浙江大学 计算机科学与技术学院, 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
唐郅杰唐郅杰
浙江大学 计算机科学与技术学院, 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找
夏鑫夏鑫
华为公司 软件工程应用技术实验室, 浙江 杭州 310007
在期刊界中查找
在百度中查找
在本站中查找
李善平李善平
浙江大学 计算机科学与技术学院, 浙江 杭州 310027
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:刘忠鑫(1994－),男,博士,特聘研究员,CCF专业会员,主要研究领域为智能软件工程,软件仓库挖掘.;唐郅杰(1999－),男,硕士生,主要研究领域为智能软件工程.;夏鑫(1986－),男,博士,CCF专业会员,主要研究领域为软件仓库挖掘,经验软件工程.;李善平(1963－),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为分布式计算,软件工程,Linux内核.
通讯作者:夏鑫,E-mail:xin.xia@acm.org
中图分类号:
基金项目:浙江大学教育基金会启真人才基金

Research Progress of Code Change Representation Learning and Its Application

Author:

LIU Zhong-Xin
LIU Zhong-Xin
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
TANG Zhi-Jie
TANG Zhi-Jie
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找
XIA Xin
XIA Xin
Software Engineering Application Technology Lab, Huawei Technologies Co. Ltd., Hangzhou 310007, China
在期刊界中查找
在百度中查找
在本站中查找
LI Shan-Ping
LI Shan-Ping
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [104]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

代码变更是软件演化过程中的关键行为, 其质量与软件质量密切相关. 对代码变更进行建模和表示是众多软件工程任务的基础, 例如即时缺陷预测、软件制品可追溯性恢复等. 近年来, 代码变更表示学习技术得到了广泛的关注与应用. 该类技术旨在学习将代码变更的语义信息表示为稠密低维实值向量, 即学习代码变更的分布式表示, 相比于传统的人工设计代码变更特征的方法具有自动学习、端到端训练和表示准确等优点. 但同时该领域目前也存在如结构信息利用困难、基准数据集缺失等挑战. 对近期代码变更表示学习技术的研究及应用进展进行了梳理和总结, 主要内容包括: (1)介绍了代码变更表示学习及其应用的一般框架. (2)梳理了现有的代码变更表示学习技术, 总结了不同技术的优缺点. (3)总结并归类了代码变更表示学习技术的下游应用. (4)归纳了代码变更表示学习技术现存的挑战和潜在的机遇, 展望了该类技术的未来发展方向.

关键词:代码变更;表示学习;代码变更表示;软件演化;软件维护

Abstract:

Code change is a kind of key behavior in software evolution, and its quality has a large impact on software quality. Modeling and representing code changes is the basis of many software engineering tasks, such as just-in-time defect prediction and recovery of software product traceability. The representation learning technologies for code changes have attracted extensive attention and have been applied to diverse applications in recent years. This type of technology targets at learning to represent the semantic information in code changes as low-dimensional dense real-valued vectors, namely, learning the distributed representation of code changes. Compared with the conventional methods of manually designing code change features, such technologies offers the advantages of automatic learning, end-to-end training, and accurate representation. However, this field is still faced with some challenges, such as great difficulties in utilizing structural information and the absence of benchmark datasets. This study surveys and summarizes the recent progress of studies and applications of representation learning technologies for code changes, and it mainly consists of the following four parts. (1) The study presents the general framework of representation learning of code changes and its application. (2) Subsequently, it reviews the currently available representation learning technologies for code changes and summarizes their respective advantages and disadvantages. (3) Then, the downstream applications of such technologies are summarized and classified. (4) Finally, this study discusses the challenges and potential opportunities ahead of representation learning technologies for code changes and suggests the directions for the future development of this type of technology.

Key words:code change;representation learning;code change representation;software evolution;software maintenance

参考文献

[1] Brudaru II, Zeller A. What is the long-term impact of changes? In: Proc. of the 2008 Int’l Workshop on Recommendation Systems for Software Engineering. Atlanta: ACM Press, 2008. 30–32.

[2] Sommerville I. Software Engineering. 9th ed., Boston: Pearson, 2011.

[3] Hoang T, Kang HJ, Lo D, Lawall J. CC2Vec: Distributed representations of code changes. In: Proc. of the 42nd IEEE/ACM Int’l Conf. on Software Engineering. Seoul: IEEE, 2020. 518–529.

[4] Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In: Proc. of the 16th IEEE/ACM Int’l Conf. on Mining Software Repositories (MSR). Montreal: IEEE, 2019. 34–45.

[5] Wang S, Liu TY, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Transactions on Software Engineering, 2020, 46(12): 1267–1293. [doi: 10.1109/TSE.2018.2877612]

[6] Loyola P, Matsuo Y. Learning feature representations from change dependency graphs for defect prediction. In: Proc. of the 28th IEEE Int’l Symp. on Software Reliability Engineering (ISSRE). Toulouse: IEEE, 2017. 361–372.

[7] Zeng ZR, Zhang YQ, Zhang HT, Zhang LM. Deep just-in-time defect prediction: How far are we? In: Proc. of the 30th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2021. 427–438.

[8] Gesi J, Li JW, Ahmed I. An empirical examination of the impact of bias on just-in-time defect prediction. In: Proc. of the 15th ACM/IEEE Int’l Symp. on Empirical Software Engineering and Measurement. Bari: ACM, 2021. 7.

[9] Ruan H, Chen BH, Peng X, Zhao WY. D_EEPL_INK: Recovering issue-commit links based on deep learning. Journal of Systems and Software, 2019, 158: 110406. [doi: 10.1016/j.jss.2019.110406]

[10] Lin JF, Liu YL, Zeng QK, Jiang M, Cleland-Huang J. Traceability transformed: Generating more accurate links with pre-trained BERT models. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Madrid: IEEE, 2021. 324–335.

[11] Yin PC, Neubig G, Allamanis M, Brockschmidt M, Gaunt AL. Learning to represent edits. In: Proc. of the 7th Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019.

[12] Jiang SY, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proc. of the 32nd IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Urbana: IEEE, 2017. 135–146.

[13] Xu SB, Yao Y, Xu F, Gu TX, Tong HH, Lu J. Commit message generation for source code changes. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 3975–3981.

[14] Jung TH. CommitBERT: Commit message generation using pre-trained programming language model. In: Proc. of the 1st Workshop on Natural Language Processing for Programming. Association for Computational Linguistics, 2021. 26–33.

[15] Liu Q, Liu ZH, Zhu HM, Fan HF, Du BW, Qian Y. Generating commit messages from diffs using pointer-generator network. In: Proc. of the 16th IEEE/ACM Int’l Conf. on Mining Software Repositories (MSR). Montreal: IEEE, 2019. 299–309.

[16] Nie LY, Gao CY, Zhong ZC, Lam W, Liu Y, Xu ZL. CoreGen: Contextualized code representation learning for commit message generation. Neurocomputing, 2021, 459: 97–107. [doi: 10.1016/j.neucom.2021.05.039]

[17] Liu SQ, Gao CY, Chen S, Nie LY, Liu Y. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering, 2022, 48(5): 1800–1817. [doi: 10.1109/TSE.2020.3038681]

[18] Jiang SY. Boosting neural commit message generation with code semantic analysis. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 1280–1282.

[19] Bai JQ, Zhou L, Blanco A, Liu SJ, Wei FR, Zhou M, Li ZJ. Jointly learning to repair code and generate commit message. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 9784–9795.

[20] Wang HY, Xia X, Lo D, He Q, Wang XY, Grundy J. Context-aware retrieval-based deep commit message generation. ACM Transactions on Software Engineering and Methodology, 2021, 30(4): 56. [doi: 10.1145/3464689]

[21] Panthaplackel S, Nie PY, Gligoric M, Li JJ, Mooney R. Learning to update natural language comments based on code changes. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 1853–1868.

[22] Liu ZX, Xia X, Yan M, Li SP. Automating just-in-time comment updating. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2020. 585–597.

[23] Panthaplackel S, Li JJ, Gligoric M, Mooney RJ. Deep just-in-time inconsistency detection between comments and source code. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(1): 427–435. [doi: 10.1609/aaai.v35i1.16119]

[24] Gao ZP, Xia X, Lo D, Grundy J, Zimmermann T. Automating the removal of obsolete TODO comments. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 218–229.

[25] Lozoya RC, Baumann A, Sabetta A, Bezzi M. Commit2Vec: Learning distributed representations of code changes. SN Computer Science, 2021, 2(3): 150. [doi: 10.1007/s42979-021-00566-z]

[26] Zhou YQ, Siow JK, Wang CY, Liu SQ, Liu Y. SPI: Automated identification of security patches via commits. ACM Transactions on Software Engineering and Methodology, 2022, 31(1): 13. [doi: 10.1145/3468854]

[27] Zhou JY, Pacheco M, Wan ZY, Xia X, Lo D, Wang Y, Hassan AE. Finding A needle in a haystack: Automated mining of silent vulnerability fixes. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 705–716.

[28] Li HY, Shi ST, Thung F, Huo X, Xu BW, Li M, Lo D. DeepReview: Automatic code review using deep multi-instance learning. In: Proc. of the 23rd Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Macao: Springer, 2019. 318–330.

[29] Shi ST, Li M, Lo D, Thung F, Huo X. Automatic code review by learning the revision of source code. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 4910–4917. [doi: 10.1609/aaai.v33i01.33014910]

[30] Yao ZY, Xu FF, Yin PC, Sun H, Neubig G. Learning structural edits via incremental tree transformations. In: Proc. of the 9th Int’l Conf. on Learning Representations. OpenReview.net, 2021.

[31] Svyatkovskiy A, Mytcowicz T, Ghorbani N, Fakhoury S, Dinella E, Bird C, Sundaresan N, Lahiri S. MergeBERT: Program merge conflict resolution via neural transformers. arXiv:2109.00084, 2022.

[32] Pravilov M, Bogomolov E, Golubev Y, Bryksin T. Unsupervised learning of general-purpose embeddings for code changes. In: Proc. of the 5th Int’l Workshop on Machine Learning Techniques for Software Quality Evolution. Athens: ACM, 2021. 7–12.

[33] Loyola P, Marrese-Taylor E, Matsuo Y. A neural architecture for generating natural language descriptions from source code changes. In: Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017. 287–292.

[34] Hoang T, Lawall J, Tian Y, Oentaryo RJ, Lo D. PatchNet: Hierarchical deep learning-based stable patch identification for the Linux kernel. IEEE Transactions on Software Engineering, 2021, 47(11): 2471–2486. [doi: 10.1109/TSE.2019.2952614]

[35] Mi JW, Shi ST, Li M. Learning code changes by exploiting bidirectional converting deviation. In: Proc. of the 12th Asian Conf. on Machine Learning. Bangkok: PMLR, 2020. 481–496.

[36] Siow JK, Gao CY, Fan LL, Chen S, Liu Y. CORE: Automating review recommendation for code changes. In: Proc. of the 27th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). London: IEEE, 2020. 284–295.

[37] Panthaplackel S, Allamanis M, Brockschmidt M. Copy that! Editing sequences by copying spans. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(15): 13622–13630. [doi: 10.1609/aaai.v35i15.17606]

[38] Hellendoorn VJ, Tsay J, Mukherjee M, Hirzel M. Towards automating code review at scale. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 1479–1482.

[39] Loyola P, Marrese-Taylor E, Balazs J, Matsuo Y, Satoh F. Content aware source code change description generation. In: Proc. of the 11th Int’l Conf. on Natural Language Generation. Tilburg University: Association for Computational Linguistics, 2018. 119–128.

[40] Zhao R, Bieber D, Swersky K, Tarlow D. Neural networks for modeling source code edits. arXiv:1904.02818, 2019.

[41] Brody S, Alon U, Yahav E. A structural model for contextual code changes. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 215. [doi: 10.1145/3428283] (查阅所有网上资料, 未找到对应的期号信息, 请联系作者确认)

[42] Ni Z, Li B, Sun XB, Chen TH, Tang B, Shi XC. Analyzing bug fix for automatic bug cause classification. Journal of Systems and Software, 2020, 163: 110538. [doi: 10.1016/j.jss.2020.110538]

[43] Meng N, Jiang ZJ, Zhong H. Classifying code commits with convolutional neural networks. In: Proc. of the 2021 Int’l Joint Conf. on Neural Networks (IJCNN). Shenzhen: IEEE, 2021. 1–8.

[44] Le THM, Hin D, Croft R, Babar MA. DeepCVA: Automated commit-level vulnerability assessment with deep multi-task learning. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 717–729.

[45] Loyola P, Gajananan K, Satoh F. Bug localization by learning to rank and represent bug inducing changes. In: Proc. of the 27th ACM Int’l Conf. on Information and Knowledge Management. Torino: ACM, 2018. 657–665.

[46] Dinella E, Mytkowicz T, Svyatkovskiy A, Bird C, Naik M, Lahiri SK. DeepMerge: Learning to merge programs. IEEE Trans. on Software Engineering, 2022.

[47] Liu ZX, Xia X, Hassan AE, Lo D, Xing ZC, Wang XY. Neural-machine-translation-based commit message generation: How far are we? In: Proc. of the 33rd IEEE/ACM Int’l Conf. on Automated Software Engineering. Montpellier: IEEE, 2018. 373–384.

[48] Tao W, Wang YL, Shi ES, Du L, Han S, Zhang HY, Zhang DM, Zhang WQ. On the evaluation of commit message generation models: An experimental study. In: Proc. of the 2021 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Luxembourg: IEEE, 2021. 126–136.

[49] Tian HY, Liu K, Kaboré AK, Koyuncu A, Li L, Klein J, Bissyandé TF. Evaluating representation learning of code changes for predicting patch correctness in program repair. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2020. 981–992.

[50] 曹英魁, 孙泽宇, 邹艳珍, 谢冰. 一种结构信息增强的代码修改自动转换方法. 软件学报, 2021, 32(4): 1006–1022. http://www.jos.org.cn/1000-9825/6227.htm

Cao YK, Sun ZY, Zou YZ, Xie B. Structurally-enhanced approach for automatic code change transformation. Ruan Jian Xue Bao/Journal of Software, 2021, 32(4): 1006–1022 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6227.htm

[51] Dong JH, Lou YL, Zhu QH, Sun ZY, Li ZL, Zhang WJ, Hao D. FIRA: Fine-grained graph-based code change representation for automated commit message generation. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Pittsburgh: IEEE, 2022. 970–981.

[52] Ciborowska A, Damevski K. Fast changeset-based bug localization with BERT. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Pittsburgh: IEEE, 2022. 946–957.

[53] Fluri B, Wursch M, Pinzger M, Gall H. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on Software Engineering, 2007, 33(11): 725–743. [doi: 10.1109/TSE.2007.70731]

[54] Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 2013, 39(6): 757–773. [doi: 10.1109/TSE.2012.70]

[55] 刘知远, 孙茂松, 林衍凯, 谢若冰. 知识表示学习研究进展. 计算机研究与发展, 2016, 53(2): 247–261. [doi: 10.7544/issn1000-1239.2016.20160020]

Liu ZY, Sun MS, Lin YK, Xie RB. Knowledge representation learning: A review. Journal of Computer Research and Development, 2016, 53(2): 247–261 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.2016.20160020]

[56] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436–444. [doi: 10.1038/nature14539]

[57] Hamilton WL, Ying R, Leskovec J. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 2017, 40(3): 52–74. (查阅所有网上资料, 未找到对应的刊名卷期页码信息, 请联系作者确认)

[58] Kolesnikov A, Zhai XH, Beyer L. Revisiting self-supervised visual representation learning. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 1920–1929.

[59] Hinton GE, McClelland JL, Rumelhart DE. Distributed representations. In: Rumelhart DE, McClelland JL, eds. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Cambridge: MIT Press, 1986. 77–109.

[60] 刘芳, 李戈, 胡星, 金芝. 基于深度学习的程序理解研究进展. 计算机研究与发展, 2019, 56(8): 1605–1620. [doi: 10.7544/issn1000-1239.2019.20190185]

Liu F, Li G, Hu X, Jin Z. Program comprehension based on deep learning. Journal of Computer Research and Development, 2019, 56(8): 1605–1620 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.2019.20190185]

[61] 胡星, 李戈, 刘芳, 金芝. 基于深度学习的程序生成与补全技术研究进展. 软件学报, 2019, 30(5): 1206-1223. http://www.jos.org.cn/1000-9825/5717.htm

Hu X, Li G, Liu F, Jin Z. Program generation and code completion techniques based on deep learning: Literature review. Ruan Jian Xue Bao/Journal of Software, 2019, 30(5): 1206-1223 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5717.htm

[62] 林泽琦, 邹艳珍, 赵俊峰, 曹英魁, 谢冰. 基于代码结构知识的软件文档语义搜索方法. 软件学报, 2019, 30(12): 3714-3729. http://www.jos.org.cn/1000-9825/5609.htm

Lin ZQ, Zou YZ, Zhao JF, Cao YK, Xie B. Software text semantic search approach based on code structure knowledge. Ruan Jian Xue Bao/Journal of Software, 2019, 30(12): 3714-3729 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5609.htm

[63] 韩笑. 知识图谱分布式表示研究 [博士学位论文]. 北京: 北京邮电大学, 2019.

Han X. Distributed representation of knowledge graphs [Ph.D. Thesis]. Beijing: Beijing University of Posts and Telecommunications, 2019 (in Chinese with English abstract).

[64] 王涛. 基于分布表示的广告点击率预估算法研究 [硕士学位论文]. 武汉: 华中科技大学, 2019.

Wang T. Research on the advertisement click-through-rate prediction algorithm based on distributed representation [MS. Thesis]. Wuhan: Huazhong University of Science and Technology, 2019 (in Chinese with English abstract).

[65] Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 2009, 10: 207–244.

[66] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proc. of the 32nd Int’l Conf. on Machine Learning. Lille: JMLR, 2015. 1–8.

[67] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.

[68] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: ACL, 2019. 4171–4186.

[69] Yang W. Identifying syntactic differences between two programs. Software: Practice and Experience, 1991, 21(7): 739–755. [doi: 10.1002/spe.4380210706]

[70] Ye X, Zheng YJ, Aljedaani W, Mkaouer MW. Recommending pull request reviewers based on code changes. Soft Computing, 2021, 25(7): 5619–5632. [doi: 10.1007/s00500-020-05559-3]

[71] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735]

[72] Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics, 2014. 1724–1734.

[73] Kim Y. Convolutional neural networks for sentence classification. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics, 2014. 1746–1751.

[74] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.

[75] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527–1554. [doi: 10.1162/neco.2006.18.7.1527]

[76] Hindle A, Barr ET, Gabel M, Su ZD, Devanbu P. On the naturalness of software. Communications of the ACM, 2016, 59(5): 122–131. [doi: 10.1145/2902362]

[77] Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M. Fine-grained and accurate source code differencing. In: Proc. of the 29th ACM/IEEE Int’l Conf. on Automated Software Engineering. Vasteras: ACM, 2014. 313–324.

[78] Mou LL, Li G, Jin Z, Zhang L, Wang T. TBCNN: A tree-based convolutional neural network for programming language processing. arXiv:1409.5718, 2015.

[79] Li YJ, Tarlow D, Brockschmidt M, Zemel RS. Gated graph sequence neural networks. In: Proc. of the 4th Int’l Conf. on Learning Representations. San Juan, 2016.

[80] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.

[81] Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for Quantum chemistry. In: Proc. of the 34th Int’l Conf. on Machine Learning. Sydney: JMLR.org, 2017. 1263–1272.

[82] Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proc. of the 31st Int’l Conf. on Machine Learning. Beijing: JMLR.org, 2014. II-1188–II-1196.

[83] Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing. Seattle: ACL, 2013. 1631–1642.

[84] Li J, Wang Y, Lyu MR, King I. Code completion with neural attention and pointer networks. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 4159–4125.

[85] Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: A method for automatic evaluation of machine translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 311–318.

[86] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego, 2015.

[87] Lin CY. ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. Barcelona: ACL, 2004. 74–81.

[88] Kryscinski W, Keskar NS, McCann B, Xiong CM, Socher R. Neural text summarization: A critical evaluation. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 540–551.

[89] Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proc. of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor: ACL, 2005. 65–72.

[90] Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. Learning to rank using gradient descent. In: Proc. of the 22nd Int’l Conf. on Machine Learning. Bonn: ACM, 2005. 89–96.

[91] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proc. of the 25th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2012. 1097–1105.

[92] Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Proc. of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels: ACL, 2018. 353–355.

[93] Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv:1909.09436, 2020.

[94] Sun C, Qiu XP, Xu YG, Huang XJ. How to fine-tune BERT for text classification? In: Proc. of the 18th China National Conf. on Chinese Computational Linguistics. Kunming: Springer, 2019. 194–206.

[95] Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

[96] Feng ZY, Guo DY, Tang D, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M. CodeBERT: A pre-trained model for programming and natural languages. In: Proc. of the Findings of the Association for Computational Linguistics: EMNLP 2020. ACL, 2020. 1536–1547.

[97] Zhang Y, Yang Q. An overview of multi-task learning. National Science Review, 2018, 5(1): 30–43. [doi: 10.1093/nsr/nwx105]

引用本文

刘忠鑫,唐郅杰,夏鑫,李善平.代码变更表示学习及其应用研究进展.软件学报,2023,34(12):5501-5526

复制

文章指标

点击次数:1703
下载次数: 5324
HTML阅读次数: 3020
引用次数: 0

历史

收稿日期:2021-12-23
最后修改日期:2022-04-21
录用日期:
在线发布日期: 2022-10-26
出版日期: 2023-12-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码