融合结构与语义特征的代码注释决策支持方法

doi:10.13328/j.cnki.jos.005528

微信服务号

微信订阅号

2025年4月1日 0:58 星期二

首页 > 过刊浏览>2018年第29卷第8期 >2226-2242. DOI:10.13328/j.cnki.jos.005528

PDF HTML阅读 XML下载导出引用引用提醒

融合结构与语义特征的代码注释决策支持方法
DOI:
                        10.13328/j.cnki.jos.005528
                    
CSTR:
                        
                    
作者:
                        黄袁黄袁
中山大学 数据科学与计算机学院, 广东 广州 510006;国家数字家庭工程技术研究中心, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
贾楠贾楠
中山大学 数据科学与计算机学院, 广东 广州 510006;河北地质大学 管理科学与工程学院, 河北 石家庄 050031
在期刊界中查找
在百度中查找
在本站中查找
周强周强
中山大学 数据科学与计算机学院, 广东 广州 510006;国家数字家庭工程技术研究中心, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
陈湘萍陈湘萍
国家数字家庭工程技术研究中心, 广东 广州 510006;中山大学 先进技术研究院, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找
熊英飞熊英飞
北京大学 信息科学技术学院 软件研究所, 北京 100871;高可信软件技术教育部重点实验室(北京大学), 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
罗笑南罗笑南
中山大学 数据科学与计算机学院, 广东 广州 510006;国家数字家庭工程技术研究中心, 广东 广州 510006
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:黄袁(1987-),男,四川内江人,博士,副研究员,主要研究领域为软件工程,程序理解,软件工程过程数据挖掘;陈湘萍(1981-),女,博士,助理研究员,CCF专业会员,主要研究领域为数据驱动的软件工程,程序理解,Web工程;贾楠(1988-),女,博士,讲师,主要研究领域为计算机仿真,数据挖掘;熊英飞(1982-),男,博士,研究员,博士生导师,CCF专业会员,主要研究领域为软件工程,程序设计语言;周强(1993-),男,硕士,主要研究领域为软件工程,软件工程过程数据挖掘;罗笑南(1963-),男,博士,教授,博士生导师,主要研究领域为图形图像处理,三维仿真CAD技术,数字家庭技术
通讯作者:陈湘萍,E-mail:chenxp8@mail.sysu.edu.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB1000101）；国家自然科学基金（61672545，61402546）；广东省科技计划项目（2013B0907 00009）；中山市科技计划项目（2016A1044）

Method Combining Structural and Semantic Features to Support Code Commenting Decision

Author:

HUANG Yuan
HUANG Yuan
School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
JIA Nan
JIA Nan
School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;School of Management Science and Engineering, Hebei GEO University, Shijiazhuang 050031, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Qiang
ZHOU Qiang
School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xiang-Ping
CHEN Xiang-Ping
National Engineering Research Center of Digital Life, Guangzhou 510006, China;Institute of Advanced Technology, Sun Yat-sen University, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找
XIONG Ying-Fei
XIONG Ying-Fei
Institute of Software, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
LUO Xiao-Nan
LUO Xiao-Nan
School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB1000101); National Natural Science Foundation of China (61672545, 61402546); Science and Technology Planning Project of Guangdong Province (2013B090700009); Science and Technology Planning Project of Zhongshan City (2016A1044)

摘要

图/表

访问统计

参考文献 [36]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

代码注释是辅助编程人员理解源代码的有效手段之一.高质量的注释决策不仅能够覆盖软件系统中的核心代码片段，还能避免产生多余的代码注释.然而在实际开发中，编程人员往往缺乏统一的注释规范，大部分的注释决策都取决于个人经验以及领域知识.对于新手程序员来说，注释决策显然成为一项重要而艰巨的任务.为了减少编程人员投入过多的精力在注释决策上，从大量的代码注释实例中学习出一种通用的注释决策规范，并提出了CommentAdviser方法，以辅助编程人员在代码开发过程中做出恰当的注释决策.由于注释决策与代码本身的上下文信息密切相关，因此，从当前代码行的上下文代码中提取代码结构特征以及代码语义特征作为支持注释决策的主要依据.然后，利用机器学习算法判定当前代码行是否为可能的注释点.在GitHub中的10个大型开源软件的数据集上评估了所提出的方法，实验结果以及用户调研表明，代码注释决策支持CommentAdviser方法的可行性和有效性.

关键词:代码注释;结构特征;语义特征;机器学习;注释决策

Abstract:

Code comment is quite important to help developer review and comprehend source code. Strategic comment decision is desired to cover core code snippets of software system without incurring unintended trivial comments. However, in current practice, there is a lack of rigorous specifications for developers to make their comment decisions. Commenting has become an important yet tough decision which mostly depends on the personal experience of developers. To reduce the effort on making comment decisions, this paper investigates a unified commenting regulation from a large number of commenting instances. A method, CommentAdviser, is proposed to guide developers in placing comments in source code. Since making comment is closely related to the context information of source code themselves, the method identifies this important factor for determining where to comment and extract them as structural context feature and semantic context feature. Next, machine learning techniques are applied to identify the possible commenting locations in source code. CommentAdviser is evaluated on 10 data sets from GitHub. The experimental results, as well as a user study, demonstrate the feasibility and effectiveness of CommentAdviser.

Key words:code comment;structural feature;semantic feature;machine learning;comment decision

参考文献

[1] Tenny T. Program readability:Procedures versus comments. IEEE Trans. on Software Engineering, 1988,14(9):1271-1279.

[2] Steidl D, Hummel B, Juergens E. Quality analysis of source code comments. In:Proc. of the 21st Int'l Conf. on Program Comprehension. 2013. 83-92.

[3] Khamis N, Witte R, Rilling J. Automatic quality assessment of source code comments:The JavadocMiner. In:Proc. of the Int'l Conf. on Natural Language Processing and Information Systems. 2010. 68-79.

[4] Fluri B, Wursch M, Gall HC. Do code and comments co-evolve? On the relation between source code and comment changes. In:Proc. of the 14th Int'l Conf. on Working Conf. on Reverse Engineering. 2007. 70-79.

[5] Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K. Towards automatically generating summary comments for Java methods. In:Proc. of the Int'l Conf. on Automated Software Engineering. 2010. 43-52.

[6] Moreno L, Aponte J, Sridhara G. Marcus A, Pollock L, Vijay-Shanker K. Automatic generation of natural language summaries for Java classes. In:Proc. of the 21st Int'l Conf. on Program Comprehension. 2013. 23-32.

[7] Oman P, Hagemeister J. Metrics for assessing a software system's maintainability. In:Proc. of the Int'l Conf. on Software Maintenance. 1992. 337-344.

[8] Dit B, Holtzhauer A, Poshyvanyk D, Kagdi HH. A dataset from change history to support evaluation of software maintenance tasks. In:Proc. of the Int'l Conf. on Mining Software Repositories. 2013. 131-134.

[9] Huang Y, Chen XP, Liu ZY, Luo XN, Zheng ZB. Using discriminative feature in software entities for relevance identification of code changes. Journal of Software:Evolution and Process, 2017,29(7):e1859.

[10] Huang Y, Zheng. QY, Chen XP, Xiong YF, Liu ZY, Luo XN. Mining version control system for automatically generating commit comment. In:Proc. of the 11th Int'l Conf. on Empirical Software Engineering and Measurement. 2017.

[11] Zhang J, Chen JJ, Hao D, Xiong YF, Xie B, Zhang L, Mei H. Search-Based inference of polynomial metamorphic relations. In:Proc. of the 29th Int'l Conf. on Automated Software Engineering. IEEE/ACM Press, 2014. 701-712.

[12] Gosling J, Joy B, Steele G, Bracha G, Buckley A. The Java language specification. Java se 8th ed., 2014.

[13] Vermeulen A, Ambler SW, Bumgardner EMG, Misfeldt T, Shur J, Thompson P. The Elements of Java Style. Cambridge:Cambridge University Press, 2000.

[14] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:Proc. of the 30th Int'l Conf. on Artificial Intelligence. 2016. 1287-1293.

[15] Beniamini G, Gingichashvili S, Orbach AK, Feitelson DG. Meaningful identifier names:The case of singleletter variables. In:Proc. of the 25th Int'l Conf. on Program Comprehension. 2017. 45-54.

[16] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In:Proc. of the 26th Int'l Conf. on Neural Information Processing Systems. 2013. 3111-3119.

[17] Aggarwal CC, Zhai C. A survey of text classification algorithms. In:Proc. of the Mining Text Data. Springer-Verlag, 2012. 163-222.

[18] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software:An update. SIGKDD Explorations Newsletter, 2009,11(1):10-18.

[19] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote:Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002,16(1):321-357.

[20] Kim SH, Zhang HY, Wu RX, Gong L. Dealing with noise in defect prediction. In:Proc. of the 33th Int'l Conf. on Software Engineering. 2011. 481-490.

[21] Dyer R, Rajan H, Nguyen HA, Nguyen TN. Mining billions of ast nodes to study actual and potential usage of java language features. In:Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 779-790.

[22] White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 87-98.

[23] Wang S, Liu TY, Tan L. Automatically learning semantic features for defect prediction. In:Proc. of the 38th Int'l Conf. on Software Engineering. 2016. 297-308.

[24] Gao Q, Zhang HS, Wang J, Xiong YF, Zhang L, Mei H. Fixing recurring crash bugs via analyzing q&a sites (t). In:Proc. of the 30th Int'l Conf. on Automated Software Engineering. 2015. 307-318.

[25] Nguyen AT, Nguyen TN. Graph-Based statistical language model for code. In:Proc. of the 37th Int'l Conf. on Software Engineering. 2015. 858-868.

[26] Negara S, Codoban M, Dig D, Johnson RE. Mining fine-grained code changes to detect unknown change patterns. In:Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 803-813.

[27] Jiang QT, Peng X, Wang H, Xing ZC, Zhao WY. Summarizing evolutionary trajectory by grouping and aggregating relevant code changes. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering. 2015. 361-370.

[28] Xu BW, Ye DH, Xing ZC, Xia X, Chen GB, Li SP. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 51-62.

[29] Guo J, Cheng JH, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In:Proc. of the 39th Int'l Conf. on Software Engineering. 2017. 3-14.

[30] Ye X, Shen H, Ma X, Bunescu R, Liu C. From word embeddings to document similarities for improved information retrieval in software engineering. In:Proc. of the 38th Int'l Conf. on Software Engineering. 2016. 404-415.

[31] Chen GB, Chen CY, Xing ZC, Xu BW. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 744-755.

[32] Yang XL, Lo D, Xia X, Bao LF, Sun JL. Combining word embedding with information retrieval to recommend similar bug reports. In:Proc. of the 27th Int'l Conf. on Software Reliability Engineering. 2016. 127-137.

[33] Storey MA, Ryall J, Bull RI, Myers D, Singer J. Todo or to bug:Exploring how task annotations play a role in the work practices of software developers. In:Proc. of the 30th Int'l Conf. on Software Engineering. 2008. 251-260.

[34] Wong E, Yang JQ, Tan L. Autocomment:Mining question and answer sites for automatic comment generation. In:Proc. of the 28th Int'l Conf. on Automated Software Engineering. 2013. 562-567.

[35] Wong E, Liu TY, Tan L. Clocom:Mining existing source code for automatic comment generation. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering. 2015. 380-389.

[36] Zhu JM, He PJ, Fu Q, Zhang HY, Lyu MR, Zhang DM. Learning to log:Helping developers make informed logging decisions. In:Proc. of the 37th Int'l Conf. on Software Engineering. 2015. 415-425.

引用本文

黄袁,贾楠,周强,陈湘萍,熊英飞,罗笑南.融合结构与语义特征的代码注释决策支持方法.软件学报,2018,29(8):2226-2242

复制

文章指标

点击次数:4209
下载次数: 6469
HTML阅读次数: 2942
引用次数: 0

历史

收稿日期:2017-07-18
最后修改日期:2017-09-28
录用日期:
在线发布日期: 2018-03-13
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码