基于代价极速学习机的软件缺陷报告分类方法
作者:
作者简介:

张天伦(1991-),男,河北保定人,博士生,主要研究领域为机器学习,软件工程,计算视觉;杨溪(1993-),男,硕士生,主要研究领域为机器学习,计算视觉,模糊集;陈荣(1969-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为机器学习,软件故障诊断,行为识别,运筹学;祝宏玉(1994-),女,硕士,主要研究领域为神经网络,不平衡数据处理,大数据算法.

通讯作者:

陈荣,E-mail:rchen@dlmu.edu.cn

基金项目:

国家自然科学基金(61672122,61602077,61732011)


Approach of Bug Reports Classification Based on Cost Extreme Learning Machine
Author:
Fund Project:

National Natural Science Foundation of China (61672122, 61602077, 61732011)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [46]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在所有的软件系统开发过程中,Bug的存在是不可避免的问题.对于软件系统的开发者来说,修复Bug最有利的工具就是Bug报告.但是人工识别Bug报告会给开发人员带来新的负担,因此,自动对Bug报告进行分类是一项很有必要的工作.基于此,提出用基于极速学习机的方法来对Bug报告进行分类.具体而言,主要解决Bug报告自动分类的3个问题:第1个是Bug报告数据集里不同类别的样本数量不平衡问题;第2个是Bug报告数据集里被标注的样本不充足问题;第3个是Bug报告数据集总体样本量不充足问题.为了解决这3个问题,分别引入了基于代价的有监督分类方法、基于模糊度的半监督学习方法以及样本迁移方法.通过在多个Bug报告数据集上进行实验,验证了这些方法的可行性和有效性.

    Abstract:

    Bug is an unavoidable problem in the development of all software systems. For developers of software system, bug report is a powerful tool for fixing bugs. However, manual recognition on bug reports tends to be time-consuming and not economical. It thus becomes significant to advance the automated classification approach to provide clear guidelines on how to assign a reasonable severity to a reported bug. In this study, several algrithoms are proposed based on extreme learning machine to automatically classify bug reports. Concretely, this study focuses on three problems in the field of bug report classification. The first one is the imbalanced class distribution in bug report dataset; the second is the insufficient labeled sample in bug report dataset; the last is the limited training data available. In order to solve these issues, three methods are proposed based on cost-sensitive supervised classification, semi-supervised learning, and sample transferring, respectively. Extensive experiments on real bug report datasets are conducted, and the results demonstrate the practicability and effectiveness of the proposed methods.

    参考文献
    [1] Xia X, Lo D, Wang XY, Zhou B. Accurate developer recommendation for bug resolution. In:Proc. of the WCRE Congress. Koblenz:IEEE, 2013. 72-81.[doi:10.1109/WCRE.2013.6671282]
    [2] Guo SK, Chen R, Li H. Using knowledge transfer and roughset to predict the severity of Android test reports via text mining. Symmetry, 2017,9(8):161-179.[doi:10.3390/sym9080161]
    [3] Antoniol G, Ayari K, Penta MD, Khomh F, Gueheneuc YG. Is it a bug or an enhancement? A text based approach to classify change requests. In:Proc. of the CASCON Congress. 2008. 304-318.[doi:10.1145/1463788.1463819]
    [4] Menzies T, Marcus A. Automated severity assessment of software defect reports. In:Proc. of the ICSM Congress. 2008. 346-355.[doi:10.1109/ICSM.2008.4658083]
    [5] Tian Y, Lo D, Xia X, Sun CN. Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering, 2015,20(5):1354-1383.
    [6] Feng Y, Chen ZY, Jones JA, Fang CR, Xu BW. Test report prioritization to assist crowdsourced testing. In:Proc. of the FSE Congress. New York:ACM Press, 2015. 225-236.
    [7] Runeson P, Alexandersson M, Nyolm O. Detection of duplicate defect reports using natural language processing. In:Proc. of the ICSE Congress. Minneapolis:IEEE, 2007. 499-510.
    [8] Sun C, Lo D, Khoo SC. Towards more accurate retrieval of duplicare bug reports. In:Proc. of the ASE Congress. Lawrence:IEEE, 2011. 253-262.
    [9] Lamkanfi A, Demeyer S, Giger E, Goethals B. Predicting the severity of a reported bug. In:Proc. of the MSR Congress. 2010. 1-10.
    [10] Yang XL, Lo D, Xia X, Huang Q, Sun JL. High-impact bug report identification with imbalanced learning strategies. Journal of Computer Science and Technology, 2017,32(1):181-198.[doi:10.1007/s11390-017-1713-3]
    [11] Huang GB, Zhu QY, Siew CK. Extreme learning machine:Theory and applications. Neuro-Computing, 2006,70(1):489-501.
    [12] Huang GB, Zhou H, Ding X. Extreme learning machine for regression and multicalss classification. IEEE Trans. on Systems, Man, and Cybernetics. Part B, Cybernetics, 2012,42(2):513-529.
    [13] Wang XZ, Xing HJ, Li Y, et al. A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans. on Fuzzy Systems, 2015,23(5):1638-1654.
    [14] Pan JL, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. on Neural Networks, 2011, 22(2):199-210.
    [15] Hinton GE. A practical guide to training restricted Boltzmann machines. Momentum, 2010,9(1):926-947.
    [16] Dai WY, Yang Q, Xue GR, Yu Y. Boosting for transfer learning. In:Proc. of the ICML Congess. 2007. 193-200.
    [17] Tang JX, Deng CW, Huang GB. Extrem learning machine for multilayer perceptron. IEEE Trans. on Neural Network and Learning Systems, 2016,27(4):809-821.
    [18] Han K, Yu D. Speech emotion recognition using deep neural network and extreme learning machine. In:Proc. of the INTERSPEECH. 2014. 223-227.
    [19] Zeng YJ, Xu X, Fang YQ, Zhao K. Traffic Sign Recognition Using Extreme Learning Classifier with Deep Convolutional Features. Springer Int'l Publishing, 2015. 223-228.
    [20] Kim Y, Jernite Y, Sontag D, Rush AM. Character-aware neural language models. In:Proc. of the AAAI Congess. 2016. 2741-2749.
    [21] Hinton GE. A fast learning algorithm for deep belief nets. Neural Computation, 2006,18(7):1527-1554.
    [22] Wang XZ, Zhang TL, Wang R. Noniterative deep learning:Incorporating restricted Boltzmann machine into multilayer random weight neural networks. IEEE Trans. on Systems, Man, and Cybernetics:Systems, 2017, Early Access.[doi:10.1109/TSMC.2017. 2701419]
    [23] Huang GB. An insight into extreme learning machines:Random neurons, random features and kernels. Cognitive Computation, 2014,6(3):376-390.
    [24] Koller D, Friedman N. Probabilistic Graphical Models:Principles and Techniques. MIT Press, 2009.
    [25] Hinton GE. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002,14(8):1771-1800.
    [26] Zhu HY, Wang XZ. A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing, 2017,251:106-114.
    [27] Zadeh LA. Fuzzy sets. Information & Control, 1965,8(3):338-353.
    [28] Eclipse. 2018. http://bugs.eclipse.org/bugs
    [29] Mozilla. 2018. http://bugzilla.mozilla.org
    [30] GNOME. 2018. http://bugzilla.gnome.org
    [31] Zhou Y, Tong Y, Gu R, Gall H. Combing text mining and data mining for bug report classification. In:Proc. of the ICSME Congess. 2014. 311-320.
    [32] Shi XF, Xu GQ, Shen FR, Zhao JX. Solving the data imbalance problem of P300 detection via random under-sampling bagging SVMs. In:Proc. of the IJCNN Congress. 2015. 12-17.
    [33] Shen FR, Yu H, Sakurai K, Hasegawa O. An incremental online semi-supervised active learning algorithm based on a self-organizing incremental neural network. Neural Computing & Applications, 2011,20(7):1061-1074.
    [34] Huang GB, Song S, Gupta JND. Semi-supervised and unsupervised extreme learning machines. IEEE Trans on Cybern, 2014, 44(12):2405-2417.
    [35] Qiu TY, Shen FR, Zhao JX. Review of self-organizing incremental neural network. Ruan Jian Xue Bao/Journal of Software, 2016, 27(9):2230-2247(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5068.htm[doi:10.13328/j.cnki.jos.005068]
    [36] Huang GB, Li MB, Chen L. Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing, 2008, 71(4-6):576-583.
    [37] Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward with random hidden nodes. IEEE Trans. on Neural Network, 2006,17(4):879-892.
    [38] Huang GB, Chen L. Enhanced random search based incremental extreme learning machine. Neurocomputing, 2008,71(16-18):3460-3468.
    [39] Zhai JH. Fusion of extreme learning machine with fuzzy integal. Fuzziness and Knowledge-Based Systems, 2013,21:23-24.
    [40] Zhai JH, Xu HY, Wang XZ. Dynamic ensemble extreme learning machine based on sample entropy. Soft Computing, 2012,16(9):1493-1502.
    [41] Wang LJ, Li M, Cai SB, Li G, Xie B, Yang FQ. Internet information search based approach to enriching textual descriptions for public Web services. Ruan Jian Xue Bao/Journal of Software, 2012,23(6):1335-1349(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4088.htm[doi:10.3724/SP.J.1001.2012.04088]
    [42] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language. In:Proc. of the AAAI Congress. 2016. 1287-1293.
    [43] Meng Z, Mou LL, Li G, Jin Z. Context-aware tree-based convolutional neural networks for natural language inference. In:Proc. of the KSEM Congress. 2016. 515-526.
    附中文参考文献:
    [35] 邱天宇,申富饶,赵金熙.自组织增量学习神经网络综述.软件学报,2016,27(9):2230-2247. http://www.jos.org.cn/1000-9825/5068.htm[doi:10.13328/j.cnki.jos.005068]
    [41] 王立杰,李萌,蔡斯博,李戈,谢冰,杨芙清.基于网络信息搜索的Web Service文本描述信息扩充方法.软件学报,2012,23(6):1335-1349. http://www.jos.org.cn/1000-9825/4088.htm[doi:10.3724/SP.J.1001.2012.04088]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张天伦,陈荣,杨溪,祝宏玉.基于代价极速学习机的软件缺陷报告分类方法.软件学报,2019,30(5):1386-1406

复制
分享
文章指标
  • 点击次数:2652
  • 下载次数: 5603
  • HTML阅读次数: 2948
  • 引用次数: 0
历史
  • 收稿日期:2018-08-31
  • 最后修改日期:2018-10-31
  • 在线发布日期: 2019-05-08
文章二维码
您是第19728213位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号