Fine-grained Defect Localization Based on Pointer Neural Network
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [76]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Software defect localization refers to the activity of finding program elements that are related to software failure. The existing defect localization techniques, however, can only produce localization results at the function or statement level. These coarse-grained localization results can affect the efficiency and effectiveness of manual debugging and automatic software defect repair. This study focuses on the fine-grained identification of specific code tokens that lead to software defects. The study establishes abstract syntax tree paths for code tokens and proposes a fine-grained defect localization model based on a pointer neural network to predict specific code tokens of defects and specific operation behaviors of repairing the tokens. A large number of defect patch data sets in open-source projects contain a large amount of trainable data, and the paths constructed based on abstract syntax trees can effectively capture the program’s structural information. Experimental results show that the model trained in this study can accurately predict defect code tokens and is significantly better than the baseline methods based on statistics and machine learning. In addition, in order to verify that fine-grained defect localization results can contribute to automatic defect repair, two kinds of program repair processes are designed based on the fine-grained defect localization results. The processes are implemented by using code completion tools to predict the correct token or by following heuristic rules to find appropriate code repair elements. The results show that both methods can effectively solve the overfitting problem in automatic software defect repair.

    Reference
    [1] Lou YL, Ghanbari A, Li X, Zhang LM, Zhang HT, Hao D, Zhang L. Can automated program repair refine fault localization? A unified debugging approach. In:Proc. of the 29th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. ACM, 2020. 75-87.
    [2] Parnin C, Orso A. Are automated debugging techniques actually helping programmers? In:Proc. of the 2011 Int'l Symp. on Software Testing and Analysis. Toronto:ACM, 2011. 199-209.
    [3] Liu K, Wang SW, Koyuncu A, Kim K, Bissyandé T F, Kim D, Wu P, Klein J, Mao XG, Le Traon Y. On the efficiency of test suite based program repair:A systematic assessment of 16 automated repair systems for Java programs. In:Proc. of the 42nd ACM/IEEE Int'l Conf. on Software Engineering. Seoul:ACM, 2020. 615-627.
    [4] Long F, Rinard M. An analysis of the search spaces for generate and validate patch generation systems. In:Proc. of the 38th Int'l Conf. on Software Engineering. Austin:ACM, 2016. 702-713.
    [5] Smith EK, Barr ET, Le Goues C, Brun Y. Is the cure worse than the disease? Overfitting in automated program repair. In:Proc. of the 10th Joint Meeting on Foundations of Software Engineering. Bergamo:ACM, 2015. 532-543.
    [6] Tian HY, Liu K, Kaboré AK, Koyuncu A, Li L, Klein J, Bissyandé TF. Evaluating representation learning of code changes for predicting patch correctness in program repair. In:Proc. of the 35th IEEE/ACM Int'l Conf. on Automated Software Engineering. Melbourne:ACM, 2020. 981-992.
    [7] Wang SW, Wen M, Lin B, Wu HJ, Qin YH, Zou DQ, Mao XG, Jin H. Automated patch correctness assessment:How far are we? In:Proc. of the 35th IEEE/ACM Int'l Conf. on Automated Software Engineering. Melbourne:ACM, 2020. 968-980.
    [8] Xiong YF, Liu XY, Zeng MH, Zhang L, Huang G. Identifying patch correctness in test-based program repair. In:Proc. of the 40th Int'l Conf. on Software Engineering. Gothenburg:ACM, 2018. 789-799.
    [9] Tao YD, Kim J, Kim S, Xu C. Automatically generated patches as debugging aids:A human study. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. Hong Kong:ACM, 2014. 64-74.
    [10] Just R, Jalali D, Ernst MD. Defects4J:A database of existing faults to enable controlled testing studies for Java programs. In:Proc. of 2014 Int'l Symp. on Software Testing and Analysis. San Jose:ACM, 2014. 437-440.
    [11] Martinez M, Monperrus M. ASTOR:A program repair library for Java (demo). In:Proc. of the 25th Int'l Symp. on Software Testing and Analysis. Saarbrücken:ACM, 2016. 441-444.
    [12] Li Y, Wang SH, Nguyen TN, van Nguyen S. Improving bug detection via context-based code representation learning and attention-based neural networks. Proc. of the ACM on Programming Languages, 2019, 3:162.[doi:10.1145/3360588
    [13] Alon U, Brody S, Levy O, Yahav E. code2seq:Generating sequences from structured representations of code. In:Proc. of the 7th Int'l Conf. on Learning Representations. New Orleans:ICLR, 2019.
    [14] Alon U, Sadaka R, Levy O, Yahav E. Structural language models of code. In:Proc. of the 37th Int'l Conf. on Machine Learning. PMLR, 2020. 245-256.
    [15] Alon U, Zilberstein M, Levy O, Yahav E. A general path-based representation for predicting program properties. In:Proc. of the 39th ACM SIGPLAN Conf. on Programming Language Design and Implementation. Philadelphia:ACM, 2018. 404-419.
    [16] Alon U, Zilberstein M, Levy O, Yahav E. code2vec:Learning distributed representations of code. Proc. of the ACM on Programming Languages, 2019, 3:40.[doi:10.1145/3290353
    [17] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In:Proc. of the 28th Int'l Conf. on Neural Information Processing Systems. Montreal:MIT Press, 2015. 2692-2700.
    [18] Wong WE, Gao RZ, Li YH, Abreu R, Wotawa F. A survey on software fault localization. IEEE Trans. on Software Engineering, 2016, 42(8):707-740.[doi:10.1109/TSE.2016.2521368
    [19] Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B. Evaluating and improving fault localization. In:Proc. of the 39th Int'l Conf. on Software Engineering. Buenos Aires:IEEE, 2017. 609-620.
    [20] Abreu R, Zoeteweij P, Van Gemund AJC. On the accuracy of spectrum-based fault localization. In:Testing:Academic and Industrial Conf. Practice and Research Techniques-MUTATION. Windsor:IEEE, 2007. 89-98.
    [21] Jones JA, Harrold MJ, Stasko J. Visualization of test information to assist fault localization. In:Proc. of the 24th Int'l Conf. on Software Engineering. Orlando:ACM, 2002. 467-477.
    [22] Papadakis M, Le Traon Y. Metallaxis-FL:Mutation-based fault localization. Software Testing, Verification and Reliability, 2015, 25(5-7):605-628.
    [23] Zhang XY, Gupta N, Gupta R. A study of effectiveness of dynamic slicing in locating real faults. Empirical Software Engineering, 2007, 12(2):143-160.[doi:10.1007/s10664-006-9007-3
    [24] Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI. Scalable statistical bug isolation. In:Proc. of the 2005 ACM SIGPLAN Conf. on Programming Language Design and Implementatio. Chicago:ACM, 2005. 15-26.
    [25] Zeller A, Hildebrandt R. Simplifying and isolating failure-inducing input. IEEE Trans. on Software Engineering, 2002, 28(2):183-200.[doi:10.1109/32.988498
    [26] Wong WE, Debroy V, Golden R, Xu XF, Thuraisingham B. Effective software fault localization using an RBF neural network. IEEE Trans. on Reliability, 2012, 61(1):149-169.[doi:10.1109/TR.2011.2172031
    [27] Koyuncu A, Liu K, Bissyandé TF, Kim D, Monperrus M, Klein J, Le Traon Y. iFixR:Bug report driven program repair. In:Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Tallinn:ACM, 2019. 314-325.
    [28] Mayer W, Stumptner M. Evaluating models for model-based debugging. In:Proc. of the 23rd IEEE/ACM Int'l Conf. on Automated Software Engineering. L'Aquila:IEEE, 2008. 128-137.
    [29] Benton S, Li X, Lou YL, Zhang LM. On the effectiveness of unified debugging:An extensive study on 16 program repair systems. In:Proc. of the 35th IEEE/ACM Int'l Conf. on Automated Software Engineering. Melbourne:ACM, 2020. 907-918.
    [30] Zou DM, Liang JJ, Xiong YF, Ernst MD, Zhang L. An empirical study of fault localization families and their combinations. IEEE Trans. on Software Engineering, 2021, 47(2):332-347.[doi:10.1109/TSE.2019.2892102
    [31] Lou YL, Zhu QH, Dong JH, Li X, Sun ZY, Hao D, Zhang L, Zhang LM. Boosting coverage-based fault localization via graph-based representation learning. In:Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Athens:ACM, 2021. 664-676.
    [32] Xie H, Lei Y, Yan M, Yu Y, Xia X, Mao XG. A universal data augmentation approach for fault localization. In:Proc. of the 44th Int'l Conf. on Software Engineering. Pittsburgh:ACM, 2022. 48-60.
    [33] Küçük Y, Henderson TAD, Podgurski A. Improving fault localization by integrating value and predicate based causal inference techniques. In:Proc. of the 43rd IEEE/ACM Int'l Conf. on Software Engineering. Madrid:IEEE, 2021. 649-660.
    [34] Xie XY, Liu ZC, Song S, Chen ZY, Xuan JF, Xu BW. Revisit of automatic debugging via human focus-tracking analysis. In:Proc. of the 38th Int'l Conf. on Software Engineering. Austin:ACM, 2016. 808-819.
    [35] Kochhar PS, Xia X, Lo D, Li SP. Practitioners' expectations on automated fault localization. In:Proc. of the 25th Int'l Symp. on Software Testing and Analysis. Saarbrucken:ACM, 2016. 165-176.
    [36] Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Traon YL. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In:Proc. of the 12th IEEE Conf. on Software Testing, Validation and Verification. Xi'an:IEEE, 2019. 102-113.
    [37] Jiang JJ, Xiong YF, Zhang HY, Gao Q, Chen XQ. Shaping program repair space with existing patches and similar code. In:Proc. of the 27th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Amsterdam:ACM, 2018. 298-309.
    [38] Goues CL, Nguyen T, Forrest S, Weimer W. GenProg:A generic method for automatic software repair. IEEE Trans. on Software Engineering, 2012, 38(1):54-72.[doi:10.1109/TSE.2011.104
    [39] Le XBD, Lo D, Goues CL. History driven program repair. In:Proc. of the 23rd IEEE Int'l Conf. on Software Analysis, Evolution, and Reengineering. Osaka:IEEE, 2016. 213-224.
    [40] Nguyen HDT, Qi DW, Roychoudhury A, Chandra S. SemFix:Program repair via semantic analysis. In:Proc. of the 35th Int'l Conf. on Software Engineering. San Francisco:IEEE, 2013. 772-781.
    [41] Mechtaev S, Yi J, Roychoudhury A. Angelix:Scalable multiline program patch synthesis via symbolic analysis. In:Proc. of the 38th Int'l Conf. on Software Engineering. Austin:ACM, 2016. 691-701.
    [42] Liu K, Koyuncu A, Kim D, Bissyandé TF. TBar:Revisiting template-based automated program repair. In:Proc. of the 28th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Beijing:ACM, 2019. 31-42.
    [43] Liu K, Koyuncu A, Kim D, Bissyandè TF. AVATAR:Fixing semantic bugs with fix patterns of static analysis violations. In:Proc. of the IEEE 26th Int'l Conf. on Software Analysis, Evolution and Reengineering. Hangzhou:IEEE, 2019. 456-467.
    [44] Lutellier T, Pham HV, Pang L, Li YT, Wei MS, Tan L. CoCoNuT:Combining context-aware neural translation models using ensemble for program repair. In:Proc. of the 29th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. ACM, 2020. 101-114.
    [45] Jiang N, Lutellier T, Tan L. CURE:Code-aware neural machine translation for automatic program repair. In:Proc. of the 43rd IEEE/ACM Int'l Conf. on Software Engineering. Madrid:IEEE, 2021. 1161-1173.
    [46] Allamanis M, Brockschmidt M, Khademi M. Learning to represent programs with graphs. In:Proc. of the 6th Int'l Conf. on Learning Representations. Vancouver:ICLR, 2018.
    [47] Bhatia S, Kohli P, Singh R. Neuro-symbolic program corrector for introductory programming assignments. In:Proc. of the 40th Int'l Conf. on Software Engineering. Gothenburg:ACM, 2018. 60-70.
    [48] Chakraborty S, Ding YRB, Allamanis M, Ray B. Codit:Code editing with tree-based neural models. IEEE Trans. on Software Engineering, 2022, 48(4):1385-1399.[doi:10.1109/TSE.2020.3020502
    [49] Zhu QH, Sun ZY, Xiao YA, Zhang WJ, Yuan K, Xiong YF, Zhang L. A syntax-guided edit decoder for neural program repair. In:Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Athens:ACM, 2021. 341-353.
    [50] Xiong YF, Wang J, Yan RF, Zhang JC, Han S, Huang G, Zhang L. Precise condition synthesis for program repair. In:Proc. of the 39th IEEE/ACM Int'l Conf. on Software Engineering. Buenos Aires:IEEE, 2017. 416-426.
    [51] Wen M, Chen JJ, Wu RX, Hao D, Cheung SC. Context-aware patch generation for better automated program repair. In:Proc. of the 40th Int'l Conf. on Software Engineering. Gothenburg:ACM, 2018. 1-11.
    [52] Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M. Fine-grained and accurate source code differencing. In:Proc. of the 29th ACM/IEEE Int'l Conf. on Automated Software Engineering. Vasteras:ACM, 2014. 313-324.
    [53] Binkley D, Davis M, Lawrie D, Morrell C. To camelcase or under_score. In:Proc. of the 17th IEEE Int'l Conf. on Program Comprehension. Vancouver:IEEE, 2009. 158-167.
    [54] Hill E, Binkley D, Lawrie D, Pollock L, Vijay-Shanker K. An empirical study of identifier splitting techniques. Empirical Software Engineering, 2014, 19(6):1754-1780.[doi:10.1007/s10664-013-9261-0
    [55] Rozovskaya A, Roth D. Grammatical error correction:Machine translation and classifiers. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin:ACL, 2016. 2205-2215.
    [56] Wang LH, Zheng XQ. Improving grammatical error correction models with purpose-built adversarial examples. In:Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing. ACL, 2020. 2858-2869.
    [57] Hindle A, Barr ET, Su ZD, Gabel M, Devanbu P. On the naturalness of software. In:Proc. of the 34th Int'l Conf. on Software Engineering. Zurich:IEEE, 2012. 837-847.
    [58] Brody S, Alon U, Yahav E. A structural model for contextual code changes. Proc. of the ACM on Programming Languages, 2020, 4:1-28.[doi:10.1145/3428283
    [59] Li J, Sun AX, Han JL, Li CL. A survey on deep learning for named entity recognition. IEEE Trans. on Knowledge and Data Engineering, 2022, 34(1):50-70.[doi:10.1109/tkde.2020.2981314
    [60] Mannor S, Peleg D, Rubinstein R. The cross entropy method for classification. In:Proc. of the 22nd Int'l Conf. on Machine Learning. Bonn:ACM, 2005. 561-568.
    [61] Karampatsis RM, Sutton CA. How often do single-statement bugs occur?:The ManySStuBs4J dataset. In:Proc. of the 17th Int'l Conf. on Mining Software Repositories. Seoul:ACM, 2020. 573-577.
    [62] Li X, Li W, Zhang YQ, Zhang LM. DeepFL:Integrating multiple fault diagnosis dimensions for deep fault localization. In:Proc. of the 28th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Beijing:ACM, 2019. 169-180.
    [63] Zhang MS, Li X, Zhang LM, Khurshid S. Boosting spectrum-based fault localization using PageRank. In:Proc. of the 26th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. Santa Barbara:ACM, 2017. 261-272.
    [64] Liu K, Kim D, Koyuncu A, Li L, Bissyandé TF, Le Traon Y. A closer look at real-world patches. In:Proc. of the 2018 IEEE Int'l Conf. on Software Maintenance and Evolution. Madrid:IEEE, 2018. 275-286.
    [65] Arcuri A, Briand L. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In:Proc. of the 33rd Int'l Conf. on Software Engineering. Honolulu:ACM, 2011. 1-10.
    [66] Madeiral F, Urli S, Maia M, Monperrus M. BEARS:An extensible Java bug benchmark for automatic program repair studies. In:Proc. of the 26th IEEE Int'l Conf. on Software Analysis, Evolution and Reengineering. Hangzhou:IEEE, 2019. 468-478.
    [67] Lin D, Koppel J, Chen AGL, Solar-Lezama A. QuixBugs:A multi-lingual program repair benchmark set based on the quixey challenge. In:Proc. of the 2017 Companion of the ACM SIGPLAN Int'l Conf. on Systems, Programming, Languages, and Applications:Software for Humanity. Vancouver:ACM, 2017. 55-56.
    [68] Saha R, Lyu YJ, Lam W, Yoshida H, Prasad MR. Bugs.jar:A large-scale, diverse dataset of real-world Java bugs. In:Proc. of the 15th IEEE/ACM Int'l Conf. on Mining Software Repositories. Gothenburg:ACM, 2018. 10-13.
    [69] Wang SW, Wen M, Chen LQ, Yi X, Mao XG. How different is it between machine-generated and developer-provided patches? An empirical study on the correct patches generated by automated program repair techniques. In:Proc. of the 2019 ACM/IEEE Int'l Symp. on Empirical Software Engineering and Measurement. Porto de Galinhas:IEEE, 2019. 1-12.
    [70] Chen LS, Pei Y, Furia CA. Contract-based program repair without the contracts. In:Proc. of the 32nd IEEE/ACM Int'l Conf. on Automated Software Engineering. Urbana:IEEE, 2017. 637-647.
    [71] Koyuncu A, Liu K, Bissyandé TF, Kim D, Klein J, Monperrus M, Le Traon Y. FixMiner:Mining relevant fix patterns for automated program repair. Empirical Software Engineering, 2020, 25(3):1980-2024.[doi:10.1007/s10664-019-09780-z
    [72] Li Y, Wang SH, Nguyen TN. DLFix:Context-based code transformation learning for automated program repair. In:Proc. of the 42nd IEEE/ACM Int'l Conf. on Software Engineering. Seoul:ACM, 2020. 602-614.
    [73] Durieux T, Monperrus M. DynaMoth:Dynamic code synthesis for automatic program repair. In:Proc. of the 11th Int'l Workshop on Automation of Software Test. Austin:ACM, 2016. 85-91.
    [74] Xuan JF, Martinez M, Demarco F, Clément M, Marcote SL, Durieux T, Berre DL, Monperrus M. Nopol:Automatic repair of conditional statement bugs in java programs. IEEE Trans. on Software Engineering, 2017, 43(1):34-55.[doi:10.1109/TSE.2016.2560811
    [75] Yuan Y, Banzhaf W. ARJA:Automated repair of Java programs via multi-objective genetic programming. IEEE Trans. on Software Engineering, 2020, 46(10):1040-1067.[doi:10.1109/TSE.2018.2874648
    [76] Wang SW, Mao XG, Niu N, Yi X, Guo AB. Multi-location program repair strategies learned from successful experience. In:Proc. of the 31st Int'l Conf. on Software Engineering and Knowledge Engineering. Lisbon:KSI, 2019. 713-777.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王尚文,刘逵,林博,黎立,Jacques KLEIN, Tegawend&#; Fran&#;ois BISSYAND&#;,毛晓光.基于指针神经网络的细粒度缺陷定位.软件学报,2024,35(4):1841-1860

Copy
Share
Article Metrics
  • Abstract:522
  • PDF: 1780
  • HTML: 666
  • Cited by: 0
History
  • Received:November 16,2021
  • Revised:June 05,2022
  • Online: August 23,2023
  • Published: April 06,2024
You are the first2030793Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063