Automatic Generation of Source Code Comments Model Based on Pointer-generator Network

doi:10.13328/j.cnki.jos.006270

微信服务号

微信订阅号

2025-6-5- 13

Home > Archive>Volume 32, Issue 7, 2021 >2142-2165. DOI:10.13328/j.cnki.jos.006270

PDF HTML XML Export Cite reminder

Automatic Generation of Source Code Comments Model Based on Pointer-generator Network
DOI:
                        10.13328/j.cnki.jos.006270
                    
Author:
                        NIU Chang-AnNIU Chang-An
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GE Ji-DongGE Ji-Dong
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
TANG ZeTANG Ze
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Chuan-YiLI Chuan-Yi
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHOU YuZHOU Yu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LUO BinLUO Bin
State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61802167, 61972197, 61802095); Natural Science Foundation of Jiangsu Province, China (BK20201250); Cooperation Fund of Huawei-NJU Creative Laboratory for the Next Programming

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Code comments plays an important role in software quality assurance, which can improve the readability of source code and make it easier to understand, reuse, and maintain. However, for various reasons, sometimes developers do not add the necessary comments, which make developers always waste a lot of time understanding the source code and greatly reduces the efficiency of software maintenance. In recent years, lots of work using machine learning to automatically generate corresponding comments for the source code. These methods extract such information as code sequence and structure, and then utilize sequence to sequence (seq2seq) neural model to generate the corresponding comments, which have achieved sound results. However, Hybrid-DeepCom, the state-of-the-art code comment generation model, is still deficient in two aspects. The first is that it may break the code structure during preprocessing, resulting in inconsistent input information of different instances, making the model learning effect poor; the second is that due to the limitations of the seq2seq model, it is not able to generate out-of-vocabulary word (OOV word) in the comment. For example, variable names, method names, and other identifiers that appear very infrequently in the source code are usually OOV words, without them, comments would be difficult to be understood. In order to solve this problem, the automatic comment generation model named CodePtr is proposed in this study. On the one hand, a complete source code encoder is added to solve the problem of code structure being broken; on the other hand, the pointer-generator network module is introduced to realize the automatic switch between the generated word mode and the copy word mode in each step of decoding, especially when encountering the identifier with few times in the input, the model can directly copy it to the output, so as to solve the problem of not being able to generate OOV word. Finally, this study compares the CodePtr and Hybrid-DeepCom models through experiments on large data sets. The results show that when the size of the vocabulary is 30 000, CodePtr is increased by 6% on average in translation performance metrics, and the effect of OOV word processing is improved by nearly 50%, which fully demonstrates the effectiveness of CodePtr model.

Key words:software quality assurance;source code comments generation;neural network;out-of-vocabulary word;pointer-generator network

Get Citation

牛长安,葛季栋,唐泽,李传艺,周宇,骆斌.基于指针生成网络的代码注释自动生成模型.软件学报,2021,32(7):2142-2165

Copy

Article Metrics

Abstract:2813
PDF: 7749
HTML: 4748
Cited by: 0

History

Received:September 15,2020
Revised:October 26,2020
Adopted:
Online: January 22,2021
Published: July 06,2021

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History