基于关键类判定的代码提交理解辅助方法
作者:
基金项目:

NSFC-广东联合基金(U1201252);国家重点研发计划(2016YFB1000101);国家自然科学基金(61672545,61672045);广东科技计划(2015B040403005)


Auxiliary Method for Code Commit Comprehension Based on Core-Class Identification
Author:
Fund Project:

NSFC-Guangdong Joint Fund (U1201252); National Key Research and Development Program of China (2016YFB1000101); National Natural Science Foundation of China Science and Technology (61672545, 61672045); Science and Technology Planning Project of Guangdong Province (2015B040403005)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [26]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    软件代码提交是最重要的软件版本演化数据之一,被广泛应用于软件审查和软件理解中.对于程序员,提交的理解难度随着受影响的类数量、修改的代码量的增加而增加.通过对大量数据的分析发现:识别出提交中核心的修改类(关键类)以及为了完成这个核心修改所进行的依赖性改动的类(非关键类),能够辅助代码提交的理解.受机器学习技术在分类领域有效性的启发,提出一种基于机器学习的关键类识别方法,将判定提交中的关键类建模为二分类问题(即关键和非关键类),从软件演化过程中产生的海量提交数据中抽取可判别性特征来度量类的关键性.在多个数据集上的实验结果表明:该方法判定关键类的综合准确率达到了87%;相比于开发人员直接理解提交,使用关键类信息提示来辅助理解提交,能够显著提高开发人员的效率和正确率.

    Abstract:

    Code commit is one of the most important software evolution data, and it is widely used in the software review and code comprehension. A commit involving multiple modified classes and code makes the review of code changes difficult. By analyzing a large amount of commit data, this study discovers that identifying the core modified classes in a commit can speed up commit review for developers. Inspired by the effectiveness of machine learning techniques in classification, the paper models the core class identification as a binary classification problem (i.e., core and non-core) and proposes discriminative features from a large number of commits to characterize the core modified classes. The experiments results show that the proposed approach achieves 87% accuracy, and using core class in commit review provides significant improvement than the ones without core class.

    参考文献
    [1] Sun XB, Li BX, Tao CQ. Using LoCMD to support software change analysis. Ruan Jian Xue Bao/Journal of Software, 2012,23(6):1368-1381(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4072.htm[doi:10.3724/SP.J.1001.2012.04072]
    [2] Barnett M, Bird C, Brunet J, Lahiri SK. Helping developers help themselves:Automatic decomposition of code review changesets. In:Proc. of the 37th Int'l Conf. on Software Engineering. IEEE, 2015. 134-144.
    [3] Herzig K, Zeller A. The impact of tangled code changes. In:Proc. of the 10th Conf. on Mining Software Repositories, IEEE, 2013. 121-130.
    [4] Hassan AE, Holt RC. Predicting change propagation in software systems. In:Proc. of the 20th Conf. on Software Maintenance. IEEE, 2004. 284-293.[doi:10.1109/ICSM.2004.1357812]
    [5] Hattori L, Lanza M. On the nature of commits. In:Proc. of the 23rd Int'l Conf. on Automated Software Engineering. IEEE/ACM, 2008. 63-71.[doi:10.1109/ASEW.2008.4686322]
    [6] Breiman L. Random forests. Machine Learning, 2001,45(1):5-32.[doi:10.1109/ASEW.2008.4686322]
    [7] Groggel DJ. Practical nonparametric statistics. Technometrics, 2000,42(3):317-318.
    [8] Tomek I. Two modifications of CNN. IEEE Trans. on Systems, Man and Cybernetics, 1976,6:769-772.
    [9] Batista GE, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 2004,6(1):20-29.
    [10] Fluri B, Wuersch M, PInzger M, Gall H. Change distilling:Tree differencing for fine-grained source code change extraction. IEEE Trans. on Software Engineering, 2007,33(11):725-743.[doi:10.1109/TSE.2007.70731]
    [11] Buse RP, Weimer WR. Automatically documenting program changes. In:Proc. of the 23rd Int'l Conf. on Automated Software Engineering. IEEE/ACM, 2010. 33-42.[doi:10.1145/1858996.1859005]
    [12] Linares-Vásquez M, Cortés-Coy LF, Aponte J, Poshyvanyk D. Changescribe:A tool for automatically generating commit messages. In:Proc. of the 37th Int'l Conf. on Software Engineering. IEEE/ACM, 2015. 709-712.[doi:10.1109/ICSE.2015.229]
    [13] Maruyama K, Kitsu E, Omori T, Hayashi S. Slicing and replaying code change history. In:Proc. of the 27th Int'l Conf. on Automated Software Engineering. IEEE/ACM, 2012. 246-249.[doi:10.1145/2351676.2351713]
    [14] Verónica UG, Stéphane D, Theo D. Visually characterizing source code changes. Science of Computer Programming, 2015,98(3):376-393.[doi:10.1016/j.scico.2013.08.002]
    [15] Dias M, Bacchelli A, Gousios G, Cassou D, Ducasse S. Untangling fine-grained code changes. In:Proc. of the 22nd Int'l Conf. on Software Analysis, Evolution and Reengineering. IEEE/ACM, 2015. 341-350.
    [16] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:Proc. of the 9th AAAI Conf. on Artificial Intelligence. 2016. 1287-1293.
    [17] Linares-Vásquez M, Mcmillan C, Poshyvanyk D, Grechanik M. On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering, 2014,19(3):582-618.[doi:10.1007/s10664-012-9230-z]
    [18] Huang Y, Chen XP, Zou QW, Luo XN. A probabilistic neural network-based approach for related software changes detection. In:Proc. of the 21st Asia-Pacific Software Engineering. 2014. 279-286.[doi:10.1109/APSEC.2014.50]
    [19] Le TB, Vásquez ML, Lo D, Poshyvanyk D. RCLinker:automated linking of issue reports and commits leveraging rich contextual information. In:Proc. of IEEE the 23rd Int'l Conf. on Program Comprehension. 2015. 36-47.[doi:10.1109/ICPC.2015.13]
    [20] Lam AN, Nguyen AT, Nguyen HA, Nguyen TN. Combining deep learning with information retrieval to localize buggy files for bug reports. In:Proc. of the 22nd Int'l Conf. on Automated Software Engineering. IEEE/ACM, 2015. 476-481.[doi:10.1109/ASE. 2015.73]
    [21] Nguyen AT, Nguyen HA, Nguyen TT, NguyenTN. Statistical learning approach for mining API usage mappings for code migration. In:Proc. of the 29th Int'l Conf. on Automated Software Engineering. IEEE/ACM, 2014. 457-468.[doi:10.1145/2642937.2643010] 1434
    [22] Nguyen AT, Nguyen TN. Graph-Based statistical language model for code. In:Proc. of the 37th Int'l Conf. on Software Engineering. IEEE/ACM, 2015. 858-868.
    [23] Zanoni M, Fontana FA, Stella F. On applying machine learning techniques for design pattern detection. Journal of Systems and Software, 2015,103(12):102-117.[doi:http://dx.doi.org/10.1016/j.jss.2015.01.037]
    [24] Liu Y, Khoshgoftaar T, Seliya N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans. on Software Engineering, 2010,36(6):852-864.[doi:10.1109/TSE.2010.51]
    附中文参考文献:
    [1] 孙小兵,李必信,陶传奇.基于LoCMD的软件修改分析技术.软件学报,2012,23(6):1368-1381. http://www.jos.org.cn/1000-9825/4072.htm[doi:10.3724/SP.J.1001.2012.04072]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

黄袁,刘志勇,陈湘萍,熊英飞,罗笑南.基于关键类判定的代码提交理解辅助方法.软件学报,2017,28(6):1418-1434

复制
分享
文章指标
  • 点击次数:4923
  • 下载次数: 5842
  • HTML阅读次数: 3175
  • 引用次数: 0
历史
  • 收稿日期:2016-07-28
  • 最后修改日期:2016-10-11
  • 在线发布日期: 2017-02-21
文章二维码
您是第19780856位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号