Statement Level Software Bug Localization Based on Historical Bug Information Retrieval
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [51]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    A large number of bug reports are generated during software development and maintenance, which can help developers to locate bugs. Information retrieval based bug localization (IRBL) analyzes the similarity of bug reports and source code files to locate bugs, achieving high accuracy at the file and function levels. However, a lot of labor and time costs are consumed to find bugs from suspicious files and function fragments due to the coarse location granularity of IRBL. This study proposes a statement level software bug localization method based on historical bug information retrieval, STMTLocator. Firstly, it retrieves historical bug reports which are similar to the bug report of the program under test and extracts the bug statements from the historical bug reports. Then, it retrieves the suspicious files according to the text similarity between the source code files and the bug report of the program under test, and extracts the suspicious statements from the suspicious files. Finally, it calculates the similarity between the suspicious statements and the historical bug statements, and arranges them in descending order to localize bug statements. To evaluate the bug localization performance of STMTLocator, comparative experiments are conducted on the Defects4J and JIRA dataset with Top@N, MRR, and other evaluation metrics. The experimental results show that STMTLocator is nearly four times than the static bug localization method BugLocator in terms of MRR and locates 7 more bug statements for Top@1. The average time used by STMTLocator to locate a bug version is reduced by 98.37% and 63.41% than dynamic bug localization methods Metallaxis and DStar, and STMTLocator has a significant advantage of not requiring the construction and execution of test cases.

    Reference
    [1] Wong WE, Gao RZ, Li YH, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740. [doi: 10.1109/TSE.2016.2521368]
    [2] Liu H, Wang WH, Zhang DF. A methodology for mapping and partitioning arbitrary n-Dimensional nested loops into 2-Dimensional VLSI arrays. Journal of Computer Science and Technology, 1993, 8(3): 221–232.
    [3] Li X, Li W, Zhang YQ, Zhang LM. DeepFL: Integrating multiple fault diagnosis dimensions for deep fault localization. In: Proc. of the 28th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Beijing: ACM, 2019. 169–180.
    [4] Renieres M, Reiss SP. Fault localization with nearest neighbor queries. In: Proc. of the 18th IEEE Int’l Conf. on Automated Software Engineering. Montreal: IEEE, 2003. 30–39.
    [5] Kochhar PS, Le TDB, Lo D. It’s not a bug, it’s a feature: Does misclassification affect bug localization? In: Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad: ACM, 2014. 296–299.
    [6] 李政亮, 陈翔, 蒋智威, 顾庆. 基于信息检索的软件缺陷定位方法综述. 软件学报, 2021, 32(2): 247–276. http://www.jos.org.cn/1000-9825/6130.htm
    Li ZL, Chen X, Jiang ZW, Gu Q. Survey on information retrieval-based software bug localization methods. Ruan Jian Xue Bao/Journal of Software, 2021, 32(2): 247-276 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6130.htm
    [7] Wen M, Wu RX, Cheung SC. Locus: Locating bugs from software changes. In: Proc. of the 31st IEEE/ACM Int’l Conf. on Automated Software Engineering. Singapore: ACM, 2016. 262–273.
    [8] Saha RK, Lease M, Khurshid S, Perry D E. Improving bug localization using structured information retrieval. In: Proc. of the 28th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Silicon Valley: IEEE, 2013. 345–355.
    [9] Zhang J, Wang XY, Hao D, Xie B, Zhang L, Mei H. A survey on bug-report analysis. Science China Information Sciences, 2015, 58(2): 1–24. [doi: 10.1007/s11432-014-5241-2]
    [10] Rahman F, Posnett D, Hindle A, Barr E, Devanbu P. BugCache for inspections: Hit or miss? In: Proc. of the 19th ACM SIGSOFT Symp. and the 13th European Conf. on Foundations of Software Engineering. Szeged: ACM, 2011. 322–331.
    [11] Youm KC, Ahn J, Lee E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82: 177–192. [doi: 10.1016/j.infsof.2016.11.002]
    [12] Parnin C, Orso A. Are automated debugging techniques actually helping programmers? In: Proc. of the 2011 Int’l Symp. on Software Testing and Analysis. Toronto: ACM, 2011. 199–209.
    [13] 陈理国, 刘超. 基于高斯过程的缺陷定位方法. 软件学报, 2014, 25(6): 1169–1179. http://www.jos.org.cn/1000-9825/4430.htm
    Chen LG, Liu C. Bug localization method based on Gaussian processes. Ruan Jian Xue Bao/Journal of Software, 2014, 25(6): 1169-1179 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4430.htm
    [14] Lukins SK, Kraft NA, Etzkorn LH. Bug localization using latent dirichlet allocation. Information and Software Technology, 2010, 52(9): 972–990. [doi: 10.1016/j.infsof.2010.04.002]
    [15] Rao S, Kak A. Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In: Proc. of the 8th Working Conf. on Mining Software Repositories. Honolulu: ACM, 2011. 43–52.
    [16] Zhou J, Zhang HY, Lo D. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proc. of the 34th Int’l Conf. on Software Engineering (ICSE). Zurich: IEEE, 2012. 14–24.
    [17] 张文, 李自强, 杜宇航, 杨叶. 方法级别的细粒度软件缺陷定位方法. 软件学报, 2019, 30(2): 195–210. http://www.jos.org.cn/1000-9825/5565.htm
    Zhang W, Li ZQ, Du YH, Yang Y. Fine-grained software bug location approach at method level. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 195-210 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5565.htm
    [18] Wu RX, Zhang HY, Cheung SC, Kim S. Crashlocator: Locating crashing faults based on crash stacks. In: Proc. of the 2014 Int’l Symp. on Software Testing and Analysis. San Jose: ACM, 2014. 204–214.
    [19] Wang SW, Lo D. Version history, similar report, and structure: Putting them together for improved bug localization. In: Proc. of the 22nd Int’l Conf. on Program Comprehension. Hyderabad: ACM, 2014. 53–63.
    [20] Chen AR, Chen TH, Wang SW. Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs. IEEE Transactions on Software Engineering, 2022, 48(8): 2905–2919. [doi: 10.1109/TSE.2021.3071473]
    [21] Takahashi A, Sae-Lim N, Hayashi S, Saeki M. A preliminary study on using code smells to improve bug localization. In: Proc. of the 26th IEEE/ACM Int’l Conf. on Program Comprehension. Gothenburg: IEEE, 2018. 324–327.
    [22] Sisman B, Kak A C. Incorporating version histories in information retrieval based bug localization. In: Proc. of the 9th IEEE Working Conf. on Mining Software Repositories. Zurich: IEEE, 2012. 50–59.
    [23] Tantithamthavorn C, Ihara A, Matsumoto KI. Using co-change histories to improve bug localization performance. In: Proc. of the 14th ACIS Int’l Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Honolulu: IEEE, 2013. 543–548.
    [24] Wang B, Xu L, Yan M, Liu C, Liu L. Multi-dimension convolutional neural network for bug localization. IEEE Transactions on Services Computing, 2022, 15(3): 1649–1663. [doi: 10.1109/tsc.2020.3006214]
    [25] Abreu R, Zoeteweij P, Van Gemund AJC. On the accuracy of spectrum-based fault localization. In: Proc. of the 2007 Academic and Industrial Conf. Practice and Research Techniques-MUTATION. Windsor: IEEE, 2007. 89–98.
    [26] Wong WE, Debroy V, Gao RZ, Li YH. The DStar method for effective software fault localization. IEEE Transactions on Reliability, 2014, 63(1): 290–308. [doi: 10.1109/TR.2013.2285319]
    [27] Wen M, Chen JJ, Tian YQ, Wu RX, Hao D, Han S, Cheung SC. Historical spectrum based fault localization. IEEE Transactions on Software Engineering, 2021, 47(11): 2348–2368. [doi: 10.1109/tse.2019.2948158]
    [28] Xie H, Lei Y, Yan M, Yu Y, Xia X, Mao XG. A universal data augmentation approach for fault localization. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: IEEE, 2022. 48–60.
    [29] 张卓, 雷晏, 毛晓光, 常曦, 薛建新, 熊庆宇. 基于词频-逆文件频率的错误定位方法. 软件学报, 2020, 31(11): 3348–3460. http://www.jos.org.cn/1000-9825/6021.htm
    Zhang Z, Lei Y, Mao XG, Chang X, Xue JX, Xiong QY. Fault localization approach using term frequency and inverse document frequency. Ruan Jian Xue Bao/Journal of Software, 2020, 31(11): 3448-3460 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6021.htm
    [30] Papadakis M, Le Traon Y. Metallaxis-FL: Mutation-based fault localization. Software Testing, Verification and Reliability, 2015, 25(5–7): 605–628.
    [31] Moon S, Kim Y, Kim M, Yoo S. Ask the mutants: Mutating faulty programs for fault localization. In: Proc. of the 7th IEEE Int’l Conf. on Software Testing, Verification and Validation. Cleveland: IEEE, 2014. 153–162.
    [32] 贺韬, 王欣明, 周晓聪, 李文军, 张震宇, 张成志. 一种基于程序变异的软件错误定位技术. 计算机学报, 2013, 36(11): 2236–2244. [doi: 10.3724/SP.J.1016.2013.02236]
    He T, Wang XM, Zhou XC, Li WJ, Zhang ZY, Cheung SC. A software fault localization technique based on program mutations. Chinese Journal of Computers, 2013, 36(11): 2236–2244 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2013.02236]
    [33] Zhang LM, Zhang L, Khurshid S. Injecting mechanical faults to localize developer faults for evolving software. ACM SIGPLAN Notices, 2013, 48(10): 765–784. [doi: 10.1145/2544173.2509551]
    [34] 姜佳君, 陈俊洁, 熊英飞. 软件缺陷自动修复技术综述. 软件学报, 2021, 32(9): 2665–2690. http://www.jos.org.cn/1000-9825/6274.htm
    Jiang JJ, Chen JJ, Xiong YF. Survey of automatic program repair techniques. Ruan Jian Xue Bao/Journal of Software, 2021, 32(9): 2665-2690 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6274.htm
    [35] 尹刚, 王涛, 刘冰珣, 周明辉, 余跃, 李志星, 欧阳建权, 王怀民. 面向开源生态的软件数据挖掘技术研究综述. 软件学报, 2018, 29(8): 2258-2271. http://www.jos.org.cn/1000-9825/5524.htm
    Yin G, Wang T, Liu BX, Zhou MH, Yu Y, Li ZX, Ouyang JQ, Wang HM. Survey of software data mining for open source ecosystem. Ruan Jian Xue Bao/Journal of Software, 2018, 29(8): 2258-2271 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5524.htm
    [36] Jatnika D, Bijaksana MA, Suryani AA. Word2vec model analysis for semantic similarities in English words. Procedia Computer Science, 2019, 157: 160–167. [doi: 10.1016/j.procs.2019.08.153]
    [37] Wang JP, Dong YH. Measurement of text similarity: A survey. Information, 2020, 11(9): 421. [doi: 10.3390/info11090421]
    [38] Zou DM, Liang JJ, Xiong YF, Ernst MD, Zhang L. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering, 2021, 47(2): 332–347. [doi: 10.1109/TSE.2019.2892102]
    [39] Meng XX, Wang X, Zhang HY, Sun HL, Liu XD. Improving fault localization and program repair with deep semantic features and transferred knowledge. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: IEEE, 2022. 1169–1180.
    [40] Zeng MH, Wu YQ, Ye ZT, Xiong TF, Zhang X, Zhang L. Fault localization via efficient probabilistic modeling of program semantics. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 958–969.
    [41] Kochhar PS, Xia X, Lo D, Li SP. Practitioners' expectations on automated fault localization. In: Proc. of the 25th Int’l Symp. on Software Testing and Analysis. Saarbrücken: ACM, 2016. 165–176.
    [42] Catolino G, Palomba F, Zaidman A, Ferrucci F. Not all bugs are the same: Understanding, characterizing, and classifying bug types. Journal of Systems and Software, 2019, 152: 165–181. [doi: 10.1016/j.jss.2019.03.002]
    [43] Jiang YJ, Liu H, Luo XQ, Zhu ZH, Chi XY, Niu N, Zhang YX, Hu YM, Bian P, Zhang L. BugBuilder: An automated approach to building bug repository. IEEE Transactions on Software Engineering, 2023, 49(4): 1433–1463. [doi: 10.1109/TSE.2022.3177713]
    [44] Jiang YJ, Liu H, Niu N, Zhang L, Hu YM. Extracting concise bug-fixing patches from human-written patches in version control systems. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Madrid: IEEE, 2021. 686–698.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

岳雷,崔展齐,陈翔,王荣存,李莉.基于历史缺陷信息检索的语句级软件缺陷定位方法.软件学报,2024,35(10):4642-4661

Copy
Share
Article Metrics
  • Abstract:653
  • PDF: 1988
  • HTML: 584
  • Cited by: 0
History
  • Received:February 06,2023
  • Revised:March 28,2023
  • Online: September 06,2023
  • Published: October 06,2024
You are the first2034065Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063