基于对抗生成网络的缺陷定位模型域数据增强方法
作者:
基金项目:

国家自然科学基金(62272072);中央高校基本科研业务费(2022CDJDX-005)


Model-domain Data Augmentation Using Generative Adversarial Network for Fault Localization
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [73]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    缺陷定位获取并分析测试用例集的运行信息, 从而度量出各个语句为缺陷的可疑性. 测试用例集由输入域数据构建, 包含成功测试用例和失败测试用例两种类型. 由于失败测试用例在输入域分布不规律且比例很低, 失败测试用例数量往往远少于成功测试用例数量. 已有研究表明, 少量失败测试用例会导致测试用例集出现类别不平衡问题, 严重影响着缺陷定位有效性. 为了解决这个问题, 提出基于对抗生成网络的缺陷定位模型域数据增强方法. 该方法基于模型域(即缺陷定位频谱信息)而非传统输入域(即程序输入), 利用对抗生成网络合成覆盖最小可疑集合的模型域失败测试用例, 从模型域上解决类别不平衡的问题. 实验结果表明, 所提方法大幅提升了11种典型缺陷定位方法的效能.

    Abstract:

    Fault localization collects and analyzes the runtime information of test case sets to evaluate the suspiciousness of each statement of being faulty. Test case sets are constructed by the data from the input domain and have two types, i.e., passing test cases and failing ones. Since failing test cases generally account for a very small portion of the input domain, and their distribution is usually random, the number of failing test cases is much fewer than that of passing ones. Previous work has shown that the lack of failing test cases leads to a class-imbalanced problem of test case sets, which severely hampers fault localization effectiveness. To address this problem, this study proposes a model-domain data augmentation approach using generative adversarial network for fault localization. Based on the model domain (i.e., spectrum information of fault localization) rather than the traditional input domain (i.e., program input), this approach uses the generative adversarial network to synthesize the model-domain failing test cases covering the minimum suspicious set, so as to address the class-imbalanced problem from the model domain. The experimental results show that the proposed approach significantly improves the effectiveness of 12 representative fault localization approaches.

    参考文献
    [1] Wong WE, Gao RZ, Li YH, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8):707-740.[doi:10.1109/TSE.2016.2521368]
    [2] Xie XY, Kuo FC, Chen TY, Yoo S, Harman M. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In:Proc. of the 5th Int'l Symp. on Search Based Software Engineering. St. Petersburg:Springer, 2013. 224-238.
    [3] Acharya M, Robinson B. Practical change impact analysis based on static program slicing for industrial software systems. In:Proc. of the 33rd Int'l Conf. on Software Engineering (ICSE). Honolulu:IEEE, 2011. 746-755.
    [4] Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B. Evaluating and improving fault localization. In:Proc. of the 39th IEEE/ACM Int'l Conf. on Software Engineering (ICSE). Buenos Aires:IEEE, 2017. 609-620.
    [5] Cleve H, Zeller A. Locating causes of program failures. In:Proc. of the 27th Int'l Conf. on Software Engineering. St. Louis:IEEE, 2005. 342-351.
    [6] Yoo S. Evolving human competitive spectra-based fault localisation techniques. In:Proc. of the 4th Int'l Symp. on Search Based Software Engineering. Riva del Garda:Springer, 2012. 244-258.
    [7] Lee HJ, Naish L, Ramamohanarao K. Effective software bug localization using spectral frequency weighting function. In:Proc. of the 34th IEEE Annual Computer Software and Applications Conf. Seoul:IEEE, 2010. 218-227.
    [8] Zhang Z, Lei Y, Mao XG, Li PP. CNN-FL:An effective approach for localizing faults using convolutional neural networks. In:Proc. of the 26th IEEE Int'l Conf. on Software Analysis, Evolution and Reengineering (SANER). Hangzhou:IEEE, 2019. 445-455.
    [9] Zhang Z, Lei Y, Mao XG, Yan M, Xu L, Wen JH. Improving deep-learning-based fault localization with resampling. Journal of Software:Evolution and Process, 2021, 33(3):e2312.[doi:10.1002/smr.2312]
    [10] Zhang Z, Lei Y, Mao XG, Yan M, Xu L, Zhang XH. A study of effectiveness of deep learning in locating real faults. Information and Software Technology, 2021, 131:106486.[doi:10.1016/j.infsof.2020.106486]
    [11] Zhang Z, Lei Y, Tan QP, Mao XG, Zeng P, Chang X. Deep learning-based fault localization with contextual information. IEICE Transactions on Information and Systems, 2017, E100(12):3027-3031.[doi:10.1587/transinf.2017EDL8143]
    [12] Li X, Li W, Zhang YQ, Zhang LM. DeepFL:Integrating multiple fault diagnosis dimensions for deep fault localization. In:Proc. of the 28th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Beijing:ACM, 2019. 169-180.
    [13] Li Y, Wang SH, Nguyen T. Fault localization with code coverage representation learning. In:Proc. of the 43rd IEEE/ACM Int'l Conf. on Software Engineering (ICSE). Madrid:IEEE, 2021. 661-673.
    [14] Lei Y, Mao XG, Zhang M, Ren JG, Jiang YH. Toward understanding information models of fault localization:Elaborate is not always better. In:Proc. of the 41st IEEE Annual Computer Software and Applications Conf. (COMPSAC). Turin:IEEE, 2017. 57-66.
    [15] 张卓, 谭庆平, 毛晓光, 雷晏, 常曦, 薛建新. 增强上下文的错误定位技术. 软件学报, 2019, 30(2):266-281. http://www.jos.org.cn/1000-9825/5677.htm
    Zhang Z, Tan QP, Mao XG, Lei Y, Chang X, Xue JX. Effective fault localization approach based on enhanced contexts. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2):266-281 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5677.htm
    [16] Naish L, Lee HJ, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology, 2011, 20(3):11.[doi:10.1145/2000791.2000795]
    [17] Gong C, Zheng Z, Li W, Hao P. Effects of class imbalance in test suites:An empirical study of spectrum-based fault localization. In:Proc. of the 36th IEEE Annual Computer Software and Applications Conf. Workshops. Izmir:IEEE, 2012. 470-475.
    [18] Zhang L, Yan LF, Zhang ZY, Zhang J, Chan WK, Zheng Z. A theoretical analysis on cloning the failed test cases to improve spectrum-based fault localization. Journal of Systems and Software, 2017, 129:35-57.[doi:10.1016/j.jss.2017.04.017]
    [19] Baudry B, Fleurey F, Le Traon Y. Improving test suites for efficient fault localization. In:Proc. of the 28th Int'l Conf. on Software Engineering. Shanghai:ACM, 2006. 82-91.
    [20] He HB, Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9):1263-1284.[doi:10.1109/TKDE.2008.239]
    [21] Krawczyk B. Learning from imbalanced data:Open challenges and future directions. Progress in Artificial Intelligence, 2016, 5(4):221-232.[doi:10.1007/s13748-016-0094-0]
    [22] Lei Y, Sun CN, Mao XG, Su ZD. How test suites impact fault localisation starting from the size. IET Software, 2018, 12(3):190-205.[doi:10.1049/iet-sen.2017.0026]
    [23] Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1):60.[doi:10.1186/s40537-019-0197-0]
    [24] Xian YQ, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In:Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018. 5542-5551.
    [25] Xian YQ, Sharma S, Schiele B, Akata Z. F-VAEGAN-D2:A feature generating framework for any-shot learning. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach:IEEE, 2019. 10267-10276.
    [26] Zhou FT, Huang S, Xing Y. Deep semantic dictionary learning for multi-label image classification. In:Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI, 2021. 3572-3580.
    [27] Tantithamthavorn C, Hassan AE, Matsumoto K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering, 2020, 46(11):1200-1219.[doi:10.1109/TSE.2018.2876537]
    [28] Xie XY, Chen TY, Kuo FC, Xu BW. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology, 2013, 22(4):31.[doi:10.1145/2522920.2522924]
    [29] Sohn J, Yoo S. FLUCCS:Using code and change metrics to improve fault localization. In:Proc. of the 26th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Santa Barbara:ACM, 2017. 273-283.
    [30] Jabbar A, Li X, Omar B. A survey on generative adversarial networks:Variants, applications, and training. ACM Computing Surveys, 2021, 54(8):157.[doi:10.1145/3463475]
    [31] Aissa FB, Mejdoub M, Zaied M. A survey on generative adversarial networks and their variants methods. In:Proc. of the 12th Int'l Conf. on Machine Vision. Amsterdam:SPIE, 2020. 1006-1012.
    [32] De Rosa GH, Papa JP. A survey on text generation using generative adversarial networks. Pattern Recognition, 2021, 119:108098.[doi:10.1016/j.patcog.2021.108098]
    [33] Harer JA, Ozdemir O, Lazovich T, Reale CP, Russell RL, Kim LY, Chin P. Learning to repair software vulnerabilities with generative adversarial networks. In:Proc. of the 32nd Int'l Conf. on Neural Information Processing Systems. Montréal:Curran Associates Inc., 2018. 7944-7954.
    [34] AlEroud A, Karabatis G. SDN-GAN:Generative adversarial deep NNS for synthesizing cyber attacks on software defined networks. In:Proc. of the 2020 OTM Confederated Int'l Conf. on the Move to Meaningful Internet Systems. Rhodes:Springer, 2020. 211-220.
    [35] Bao L, Liu X, Wang FZ, Fang BY. ACTGAN:Automatic configuration tuning for software systems with generative adversarial networks. In:Proc. of the 34th IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). San Diego:IEEE, 2019. 465-476.
    [36] Chouhan SS, Rathore SS. Generative adversarial networks-based imbalance learning in software aging-related bug prediction. IEEE Transactions on Reliability, 2021, 70(2):626-642.[doi:10.1109/TR.2021.3052510]
    [37] Sun HW, Nie YP, Li X, Huang MH, Tian JW, Kong W. An automatic code generation method based on sequence generative adversarial network. In:Proc. of the 7th IEEE Int'l Conf. on Data Science in Cyberspace (DSC). Guilin:IEEE, 2022. 383-390.
    [38] Krawczyk B, McInnes BT. Local ensemble learning from imbalanced and noisy data for word sense disambiguation. Pattern Recognition, 2018, 78:103-119.[doi:10.1016/j.patcog.2017.10.028]
    [39] Defects4J. 2023. https://github.com/rjust/defects4j
    [40] ManyBugs. 2019. https://github.com/squaresLab/ManyBugs
    [41] SIR. 2019. http://sir.unl.edu/portal/index.php
    [42] Bug_Location. 2021. https://github.com/oy-sarah/bug_location
    [43] Abreu R, González A, Zoeteweij P, Van Gemund AJC. Automatic software fault localization using generic program invariants. In:Proc. of the 2008 ACM Symp. on Applied Computing. Fortaleza:ACM, 2008. 712-717.
    [44] Jiang JJ, Wang R, Xiong YF, Chen XP, Zhang L. Combining spectrum-based fault localization and statistical debugging:An empirical study. In:Proc. of the 34th IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). San Diego:IEEE, 2019. 502-514.
    [45] Zou DM, Liang JQ, Xiong YF, Ernst MD, Zhang L. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering, 2019, 47(2):332-347.[doi:10.1109/TSE.2019.2892102]
    [46] Wang T, Roychoudhury A. JSlice. 2008. http://jslice.sourceforge.net/
    [47] Hammacher. JavaSlicer. 2016. https://github.com/hammacher/javaslicer
    [48] WET. 2010. http://wet.cs.ucr.edu/
    [49] Wang HF, Du B, He J, Liu Y, Chen X. IETCR:An information entropy based test case reduction strategy for mutation-based fault localization. IEEE Access, 2020, 8:124297-124310.[doi:10.1109/ACCESS.2020.3004145]
    [50] Parnin C, Orso A. Are automated debugging techniques actually helping programmers? In:Proc. of the 2011 Int'l Symp. on Software Testing and Analysis. Toronto:ACM, 2011. 199-209.
    [51] Lei Y, Mao XG, Dai ZY, Wang CS. Effective statistical fault localization using program slices. In:Proc. of the 36th IEEE Annual Computer Software and Applications Conf. Izmir:IEEE, 2012. 1-10.
    [52] Debroy V, Wong WE, Xu XF, Choi B. A grouping-based strategy to improve the effectiveness of fault localization techniques. In:Proc. of the 10th Int'l Conf. on Quality Software. Zhangjiajie:IEEE, 2010. 13-22.
    [53] Richardson A. Nonparametric statistics for non-statisticians:A step-by-step approach by Gregory W. Corder, dale I. foreman. International Statistical Review, 2010, 78(3):451-452.[doi:10.1111/j.1751-5823.2010.00122_6.x]
    [54] 张卓, 雷晏, 毛晓光, 常曦, 薛建新, 熊庆宇. 基于词频-逆文件频率的错误定位方法. 软件学报, 2020, 31(11):3448-3460. http://www.jos.org.cn/1000-9825/6021.htm
    Zhang Z, Lei Y, Mao XG, Chang X, Xue JX, Xiong QY. Fault localization approach using term frequency and inverse document frequency. Ruan Jian Xue Bao/Journal of Software, 2020, 31(11):3448-3460 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6021.htm
    [55] Arcuri A, Briand L. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In:Proc. of the 33rd Int'l Conf. on Software Engineering (ICSE). Honolulu:IEEE, 2011. 1-10.
    [56] Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 2000, 25(2):101-132.[doi:10.3102/10769986025002101]
    [57] Jones JA, Bowring JF, Harrold MJ. Debugging in parallel. In:Proc. of the 2007 Int'l Symp. on Software Testing and Analysis. London:ACM, 2007. 16-26.
    [58] Wong E, Wei TT, Qi Y, Zhao L. A crosstab-based statistical method for effective fault localization. In:Proc. of the 1st Int'l Conf. on Software Testing, Verification, and Validation. Lillehammer:IEEE, 2008. 42-51.
    [59] Japkowicz N, Stephen S. The class imbalance problem:A systematic study. Intelligent Data Analysis, 2002, 6(5):429-449.[doi:10.3233/IDA-2002-6504]
    [60] Hao D, Pan Y, Zhang L, Zhao W, Mei H, Sun JS. A similarity-aware approach to testing based fault localization. In:Proc. of the 20th IEEE/ACM Int'l Conf. on Automated Software Engineering. Long Beach:ACM, 2005. 291-294.
    [61] Yu YB, Jones JA, Harrold MJ. An empirical study of the effects of test-suite reduction on fault localization. In:Proc. of the 30th ACM/IEEE Int'l Conf. on Software Engineering. Leipzig:IEEE, 2008. 201-210.
    [62] Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint:Problem determination in large, dynamic Internet services. In:Proc. of the 2002 Int'l Conf. on Dependable Systems and Networks. Washington:IEEE, 2002. 595-604.
    [63] Jones JA. Fault localization using visualization of test information. In:Proc. of the 26th Int'l Conf. on Software Engineering. Edinburgh:IEEE, 2004. 54-56.
    [64] Abreu R, Zoeteweij P, van Gemund AJC. An evaluation of similarity coefficients for software fault localization. In:Proc. of the 12th Pacific Rim Int'l Symp. on Dependable Computing (PRDC 2006). Riverside:IEEE, 2006. 39-46.
    [65] Abreu R, Zoeteweij P, Golsteijn R, Van Gemund AJC. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11):1780-1792.[doi:10.1016/j.jss.2009.06.035]
    [66] Wong WE, Qi Y, Zhao L, Cai KY. Effective fault localization using code coverage. In:Proc. of the 31st Annual Int'l Computer Software and Applications Conf. (COMPSAC 2007). Beijing:IEEE, 2007. 449-456.
    [67] Wong WE, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2):188-208.[doi:10.1016/j.jss.2009.09.037]
    [68] Wong WE, Debroy V, Li YH, Gao RZ. Software fault localization using DStar (D*). In:Proc. of the 6th IEEE Int'l Conf. on Software Security and Reliability. Gaithersburg:IEEE, 2012. 21-30.
    [69] Dean BC, Pressly WB, Malloy BA, Whitley AA. A linear programming approach for automated localization of multiple faults. In:Proc. of the 2009 IEEE/ACM Int'l Conf. on Automated Software Engineering. Auckland:IEEE, 2009. 640-644.
    [70] Abreu R, Zoeteweij P, van Gemund AJC. Localizing software faults simultaneously. In:Proc. of the 9th Int'l Conf. on Quality Software. Jeju:IEEE, 2009. 367-376.
    [71] Abreu R, Zoeteweij P, Van Gemund AJC. Simultaneous debugging of software faults. Journal of Systems and Software, 2011, 84(4):573-586.[doi:10.1016/j.jss.2010.11.915]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张卓,雷晏,毛晓光,薛建新,常曦.基于对抗生成网络的缺陷定位模型域数据增强方法.软件学报,2024,35(5):2289-2306

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-01-07
  • 最后修改日期:2022-11-17
  • 在线发布日期: 2023-08-30
  • 出版日期: 2024-05-06
文章二维码
您是第20486352位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号