融合结构与语义特征的代码注释决策支持方法
作者:
作者简介:

黄袁(1987-),男,四川内江人,博士,副研究员,主要研究领域为软件工程,程序理解,软件工程过程数据挖掘;陈湘萍(1981-),女,博士,助理研究员,CCF专业会员,主要研究领域为数据驱动的软件工程,程序理解,Web工程;贾楠(1988-),女,博士,讲师,主要研究领域为计算机仿真,数据挖掘;熊英飞(1982-),男,博士,研究员,博士生导师,CCF专业会员,主要研究领域为软件工程,程序设计语言;周强(1993-),男,硕士,主要研究领域为软件工程,软件工程过程数据挖掘;罗笑南(1963-),男,博士,教授,博士生导师,主要研究领域为图形图像处理,三维仿真CAD技术,数字家庭技术

通讯作者:

陈湘萍,E-mail:chenxp8@mail.sysu.edu.cn

基金项目:

国家重点研发计划(2016YFB1000101);国家自然科学基金(61672545,61402546);广东省科技计划项目(2013B0907 00009);中山市科技计划项目(2016A1044)


Method Combining Structural and Semantic Features to Support Code Commenting Decision
Author:
Fund Project:

National Key Research and Development Program of China (2016YFB1000101); National Natural Science Foundation of China (61672545, 61402546); Science and Technology Planning Project of Guangdong Province (2013B090700009); Science and Technology Planning Project of Zhongshan City (2016A1044)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [36]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    代码注释是辅助编程人员理解源代码的有效手段之一.高质量的注释决策不仅能够覆盖软件系统中的核心代码片段,还能避免产生多余的代码注释.然而在实际开发中,编程人员往往缺乏统一的注释规范,大部分的注释决策都取决于个人经验以及领域知识.对于新手程序员来说,注释决策显然成为一项重要而艰巨的任务.为了减少编程人员投入过多的精力在注释决策上,从大量的代码注释实例中学习出一种通用的注释决策规范,并提出了CommentAdviser方法,以辅助编程人员在代码开发过程中做出恰当的注释决策.由于注释决策与代码本身的上下文信息密切相关,因此,从当前代码行的上下文代码中提取代码结构特征以及代码语义特征作为支持注释决策的主要依据.然后,利用机器学习算法判定当前代码行是否为可能的注释点.在GitHub中的10个大型开源软件的数据集上评估了所提出的方法,实验结果以及用户调研表明,代码注释决策支持CommentAdviser方法的可行性和有效性.

    Abstract:

    Code comment is quite important to help developer review and comprehend source code. Strategic comment decision is desired to cover core code snippets of software system without incurring unintended trivial comments. However, in current practice, there is a lack of rigorous specifications for developers to make their comment decisions. Commenting has become an important yet tough decision which mostly depends on the personal experience of developers. To reduce the effort on making comment decisions, this paper investigates a unified commenting regulation from a large number of commenting instances. A method, CommentAdviser, is proposed to guide developers in placing comments in source code. Since making comment is closely related to the context information of source code themselves, the method identifies this important factor for determining where to comment and extract them as structural context feature and semantic context feature. Next, machine learning techniques are applied to identify the possible commenting locations in source code. CommentAdviser is evaluated on 10 data sets from GitHub. The experimental results, as well as a user study, demonstrate the feasibility and effectiveness of CommentAdviser.

    参考文献
    [1] Tenny T. Program readability:Procedures versus comments. IEEE Trans. on Software Engineering, 1988,14(9):1271-1279.
    [2] Steidl D, Hummel B, Juergens E. Quality analysis of source code comments. In:Proc. of the 21st Int'l Conf. on Program Comprehension. 2013. 83-92.
    [3] Khamis N, Witte R, Rilling J. Automatic quality assessment of source code comments:The JavadocMiner. In:Proc. of the Int'l Conf. on Natural Language Processing and Information Systems. 2010. 68-79.
    [4] Fluri B, Wursch M, Gall HC. Do code and comments co-evolve? On the relation between source code and comment changes. In:Proc. of the 14th Int'l Conf. on Working Conf. on Reverse Engineering. 2007. 70-79.
    [5] Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K. Towards automatically generating summary comments for Java methods. In:Proc. of the Int'l Conf. on Automated Software Engineering. 2010. 43-52.
    [6] Moreno L, Aponte J, Sridhara G. Marcus A, Pollock L, Vijay-Shanker K. Automatic generation of natural language summaries for Java classes. In:Proc. of the 21st Int'l Conf. on Program Comprehension. 2013. 23-32.
    [7] Oman P, Hagemeister J. Metrics for assessing a software system's maintainability. In:Proc. of the Int'l Conf. on Software Maintenance. 1992. 337-344.
    [8] Dit B, Holtzhauer A, Poshyvanyk D, Kagdi HH. A dataset from change history to support evaluation of software maintenance tasks. In:Proc. of the Int'l Conf. on Mining Software Repositories. 2013. 131-134.
    [9] Huang Y, Chen XP, Liu ZY, Luo XN, Zheng ZB. Using discriminative feature in software entities for relevance identification of code changes. Journal of Software:Evolution and Process, 2017,29(7):e1859.
    [10] Huang Y, Zheng. QY, Chen XP, Xiong YF, Liu ZY, Luo XN. Mining version control system for automatically generating commit comment. In:Proc. of the 11th Int'l Conf. on Empirical Software Engineering and Measurement. 2017.
    [11] Zhang J, Chen JJ, Hao D, Xiong YF, Xie B, Zhang L, Mei H. Search-Based inference of polynomial metamorphic relations. In:Proc. of the 29th Int'l Conf. on Automated Software Engineering. IEEE/ACM Press, 2014. 701-712.
    [12] Gosling J, Joy B, Steele G, Bracha G, Buckley A. The Java language specification. Java se 8th ed., 2014.
    [13] Vermeulen A, Ambler SW, Bumgardner EMG, Misfeldt T, Shur J, Thompson P. The Elements of Java Style. Cambridge:Cambridge University Press, 2000.
    [14] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:Proc. of the 30th Int'l Conf. on Artificial Intelligence. 2016. 1287-1293.
    [15] Beniamini G, Gingichashvili S, Orbach AK, Feitelson DG. Meaningful identifier names:The case of singleletter variables. In:Proc. of the 25th Int'l Conf. on Program Comprehension. 2017. 45-54.
    [16] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In:Proc. of the 26th Int'l Conf. on Neural Information Processing Systems. 2013. 3111-3119.
    [17] Aggarwal CC, Zhai C. A survey of text classification algorithms. In:Proc. of the Mining Text Data. Springer-Verlag, 2012. 163-222.
    [18] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software:An update. SIGKDD Explorations Newsletter, 2009,11(1):10-18.
    [19] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote:Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002,16(1):321-357.
    [20] Kim SH, Zhang HY, Wu RX, Gong L. Dealing with noise in defect prediction. In:Proc. of the 33th Int'l Conf. on Software Engineering. 2011. 481-490.
    [21] Dyer R, Rajan H, Nguyen HA, Nguyen TN. Mining billions of ast nodes to study actual and potential usage of java language features. In:Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 779-790.
    [22] White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 87-98.
    [23] Wang S, Liu TY, Tan L. Automatically learning semantic features for defect prediction. In:Proc. of the 38th Int'l Conf. on Software Engineering. 2016. 297-308.
    [24] Gao Q, Zhang HS, Wang J, Xiong YF, Zhang L, Mei H. Fixing recurring crash bugs via analyzing q&a sites (t). In:Proc. of the 30th Int'l Conf. on Automated Software Engineering. 2015. 307-318.
    [25] Nguyen AT, Nguyen TN. Graph-Based statistical language model for code. In:Proc. of the 37th Int'l Conf. on Software Engineering. 2015. 858-868.
    [26] Negara S, Codoban M, Dig D, Johnson RE. Mining fine-grained code changes to detect unknown change patterns. In:Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 803-813.
    [27] Jiang QT, Peng X, Wang H, Xing ZC, Zhao WY. Summarizing evolutionary trajectory by grouping and aggregating relevant code changes. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering. 2015. 361-370.
    [28] Xu BW, Ye DH, Xing ZC, Xia X, Chen GB, Li SP. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 51-62.
    [29] Guo J, Cheng JH, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In:Proc. of the 39th Int'l Conf. on Software Engineering. 2017. 3-14.
    [30] Ye X, Shen H, Ma X, Bunescu R, Liu C. From word embeddings to document similarities for improved information retrieval in software engineering. In:Proc. of the 38th Int'l Conf. on Software Engineering. 2016. 404-415.
    [31] Chen GB, Chen CY, Xing ZC, Xu BW. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In:Proc. of the 31st Int'l Conf. on Automated Software Engineering. 2016. 744-755.
    [32] Yang XL, Lo D, Xia X, Bao LF, Sun JL. Combining word embedding with information retrieval to recommend similar bug reports. In:Proc. of the 27th Int'l Conf. on Software Reliability Engineering. 2016. 127-137.
    [33] Storey MA, Ryall J, Bull RI, Myers D, Singer J. Todo or to bug:Exploring how task annotations play a role in the work practices of software developers. In:Proc. of the 30th Int'l Conf. on Software Engineering. 2008. 251-260.
    [34] Wong E, Yang JQ, Tan L. Autocomment:Mining question and answer sites for automatic comment generation. In:Proc. of the 28th Int'l Conf. on Automated Software Engineering. 2013. 562-567.
    [35] Wong E, Liu TY, Tan L. Clocom:Mining existing source code for automatic comment generation. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution, and Reengineering. 2015. 380-389.
    [36] Zhu JM, He PJ, Fu Q, Zhang HY, Lyu MR, Zhang DM. Learning to log:Helping developers make informed logging decisions. In:Proc. of the 37th Int'l Conf. on Software Engineering. 2015. 415-425.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

黄袁,贾楠,周强,陈湘萍,熊英飞,罗笑南.融合结构与语义特征的代码注释决策支持方法.软件学报,2018,29(8):2226-2242

复制
分享
文章指标
  • 点击次数:4209
  • 下载次数: 6469
  • HTML阅读次数: 2942
  • 引用次数: 0
历史
  • 收稿日期:2017-07-18
  • 最后修改日期:2017-09-28
  • 在线发布日期: 2018-03-13
文章二维码
您是第19759955位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号