融合信息检索和深度模型特征的软件缺陷定位方法
作者:
作者简介:

申宗汶(2000—),男,硕士生,CCF学生会员,主要研究领域为软件缺陷定位;牛菲菲(1995—),女,博士,CCF学生会员,主要研究领域为缺陷定位,用户特征请求分析;李传艺(1991—),男,博士,准聘助理教授,博士生导师,CCF专业会员,主要研究领域为软件工程,业务过程管理,自然语言处理;陈翔(1980—),男,博士,副教授,CCF高级会员,主要研究领域为智能软件工程,软件仓库挖掘,经验软件工程;李奇(1998—),男,硕士生,主要研究领域为自然语言处理,智能软件工程;葛季栋(1978—),男,博士,副教授,博士生导师,CCF高级会员,主要研究领域为自然语言处理,智能软件工程,分布式计算,边缘计算,服务计算,业务过程管理;骆斌(1967—),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为分布式计算,边缘计算,自然语言处理,智能软件工程.

通讯作者:

葛季栋,E-mail:gjd@nju.edu.cn

基金项目:

国家重点研发计划(2022YFF0711404);江苏省第六期“333工程”领军型人才团队项目;江苏省自然科学基金(BK20201250)


Software Bug Location Method Combining Information Retrieval and Deep Model Features
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [44]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    构建自动化的缺陷定位方法能够加快程序员利用缺陷报告定位到复杂软件系统缺陷代码的过程. 早期相关研究人员将缺陷定位视为检索任务, 通过分析缺陷报告和相关代码构造缺陷特征, 并结合信息检索的方法实现缺陷定位. 随着深度学习的发展, 利用深度模型特征的缺陷定位方法也取得了一定效果. 然而, 由于深度模型训练的时间成本和耗费资源相对较高, 现有基于深度模型的缺陷定位研究方法存在实验搜索空间和真实情况不符的情况. 这些研究方法在测试时并没有将项目下的所有代码作为搜索空间, 而仅仅搜索了与已有缺陷相关的代码, 例如DNNLOC方法、DeepLocator方法、DreamLoc方法. 这种做法和现实中程序员进行缺陷定位的搜索场景是不一致的. 致力于模拟缺陷定位的真实场景, 提出了一种融合信息检索和深度模型特征的TosLoc方法进行缺陷定位. TosLoc方法首先通过信息检索的方式检索真实项目的所有源代码, 确保已有特征的充分利用; 再利用深度模型挖掘源代码和缺陷报告的语义, 获取最终定位结果. 通过两阶段的检索, TosLoc方法能够对单个项目的所有代码实现快速缺陷定位. 通过在4个常用的真实Java项目上进行实验, TosLoc方法能够在检索速度和准确性上超越已有基准方法. 与最优基准方法DreamLoc相比, TosLoc方法在消耗DreamLoc方法35%的检索时间下, 平均MRR值比DreamLoc方法提高了2.5%, 平均MAP值提高了6.0%.

    Abstract:

    Automated bug localization methods can accelerate the process of programmers locating complex software system defects using bug reports. Early researchers treated bug localization as a retrieval task, constructing defect features by analyzing bug reports and related code, and applying information retrieval techniques for bug localization. With the development of deep learning, bug localization methods utilizing deep model features have also achieved certain effectiveness. Nevertheless, existing deep learning-based bug localization research methods suffer from experimental search space mismatching real-world scenarios due to the high time and resource costs of deep model training. These research methods do not consider all the files in the project as the search space during testing; they only search for code related to marked defects, such as the DNNLOC method, DreamLoc method, and DeepLocator method. This approach is inconsistent with the actual search scenario for programmers to localize real bug. In order to simulate the real-world scenario of bug localization, this study proposes the TosLoc method, which combines information retrieval and deep model features for bug localization. Firstly, information retrieval is employed to retrieve all source codes of real projects to ensure comprehensive utilization of existing features. Subsequently, deep models are utilized to extract semantics from source codes and bug reports. The TosLoc method achieves rapid localization of all code in a single project through two-stage retrieval. Experimental results conducted on four popular Java projects demonstrate that the proposed TosLoc method outperforms existing benchmark methods in terms of retrieval speed and accuracy. Compared to the best method called DreamLoc, the TosLoc method achieves an average MRR improvement of 2.5% and an average MAP improvement of 6.0% while only requiring 35% of the retrieval time of the DreamLoc method.

    参考文献
    [1] 李政亮, 陈翔, 蒋智威, 顾庆. 基于信息检索的软件缺陷定位方法综述. 软件学报, 2021, 32(2): 247-276. http://www.jos.org.cn/1000-9825/6130.htm [doi: 10.13328/j.cnki.jos.006130]
    Li ZL, Chen X, Jiang ZW, Gu Q. Survey on information retrieval-based software bug localization methods. Ruan Jian Xue Bao/ Journal of Software, 2021, 32(2): 247-276(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6130.htm [doi: 10.13328/j.cnki.jos.006130]
    [2] 郭肇强, 周慧聪, 刘释然, 李言辉, 陈林, 周毓明, 徐宝文. 基于信息检索的缺陷定位: 问题、进展与挑战. 软件学报, 2020, 31(9): 2826-2854. http://www.jos.org.cn/1000-9825/6087.htm [doi: 10.13328/j.cnki.jos.006087]
    Guo ZQ, Zhou HC, Liu SR, Li YH, Chen L, Zhou YM, Xu BW. Information retrieval based bug localization: Research problem, progress, and challenges. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9): 2826-2854(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6087.htm [doi: 10.13328/j.cnki.jos.006087]
    [3] Cao J, Yang S, Jiang W, Zeng H, Shen B, Zhong H. BugPecker: Locating faulty methods with deep learning on revision graphs. In: Proc. of the 35th Int'l Conf. on Automated Software Engineering (ASE). ACM, 2020. 1214-1218. [doi: 10.1145/3324884.3418934]
    [4] Saha RK, Lease M, Khurshid S, Perry DE. Improving bug localization using structured information retrieval. In: Proc. of the 28th Int'l Conf. on Automated Software Engineering (ASE). Silicon Valley: IEEE, 2013. 345-355.
    [5] Wong WE, Gao R, Li Y, Abreu R, Wotawa F. A survey on software fault localization. IEEE Trans. on Software Engineering, 2016, 42(8): 707-740. [doi: 10.1109/tse.2016.2521368]
    [6] Ye X, Bunescu R, Liu C. Learning to rank relevant files for bug reports using domain knowledge. In: Proc. of the 22th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. Hong Kong: ACM, 2014. 689-699.
    [7] Wong CP, Xiong Y, Zhang H, Hao D, Zhang L, Mei H. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proc. of the Int'l Conf. on Software Maintenance and Evolution (ICSME). Victoria: IEEE, 2014. 181-190. [doi: 10.1109/icsme.2014.40]
    [8] Lam AN, Nguyen AT, Nguyen HA, Nguyen TN. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: Proc. of the 30th IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). Lincoln: IEEE, 2015. 476-481. [doi: 10.1109/ase.2015.73]
    [9] Wen M, Wu R, Cheung SC. Locus: Locating bugs from software changes. In: Proc. of the 31st IEEE/ACM Int'l Conf. on Automated Software Engineering. Singapore: ACM, 2016. 262-273. [doi: 10.1145/2970276.2970359]
    [10] Huo X, Thung F, Li M, Lo D, Shi ST. Deep transfer bug localization. IEEE Trans. on Software Engineering, 2021, 47(7): 1368-1380.
    [11] Huo X, Li M. Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proc. of the 26th Int'l Conf. on Artificial Intelligence. Melbourne: Int'l Joint Conf. on Artificial Intelligence Organization, 2017. 1909-1915. [doi: 10.24963/ijcai.2017/265]
    [12] Lam AN, Nguyen AT, Nguyen HA, Nguyen TN. Bug localization with combination of deep learning and information retrieval. In: Proc. of the 25th IEEE/ACM Int'l Conf. on Program Comprehension (ICPC). IEEE, 2017. 218-229.
    [13] Xiao Y, Keung J, Mi Q, Bennin KE. Improving bug localization with an enhanced convolutional neural network. In: Proc. of the 24th Asia-Pacific Software Engineering Conf. (APSEC). 2017. 338-347. [doi: 10.1109/apsec.2017.40]
    [14] Qi B, Sun H, Yuan W, Zhang H, Meng X. DreamLoc: A deep relevance matching-based framework for bug localization. IEEE Trans. on Reliability, 2022, 71(1): 235-249. [doi: 10.1109/tr.2021.3104728]
    [15] 周慧聪, 郭肇强, 梅元清, 李言辉, 陈林, 周毓明. 版本失配和数据泄露对基于缺陷报告的缺陷定位模型的影响. 软件学报, 2023, 34(5): 2196-2217. http://www.jos.org.cn/1000-9825/6401.htm [doi: 10.13328/j.cnki.jos.006401]
    Zhou HC, Guo ZQ, Mei YQ, Li YH, Chen L, Zhou YM. Watch out for version mismtaching and data leakage! A case study of their influence in bug report based bug localization models. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2196-2217(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6401.htm [doi: 10.13328/j.cnki.jos.006401]
    [16] 张芸, 刘佳琨, 夏鑫, 吴明晖, 颜晖. 基于信息检索的软件缺陷定位技术研究进展. 软件学报, 2020, 31(8): 2432-2452. http://www.jos.org.cn/1000-9825/6081.htm [doi: 10.13328/j.cnki.jos.006081]
    Zhang Y, Liu JK, Xia X, Wu MH, Yan H. Research progress on software bug localization technology based on information retrieval. Ruan Jian Xue Bao/Journal of Software, 2020, 31(8): 2432-2452(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6081.htm [doi: 10.13328/j.cnki.jos.006081]
    [17] 刘旭同, 郭肇强, 刘释然, 张鹏, 卢红敏, 周毓明. 软件缺陷预测模型间的比较实验: 问题、进展与挑战. 软件学报, 2023, 34(2): 582-624. http://www.jos.org.cn/1000-9825/6714.htm [doi: 10.13328/j.cnki.jos.006714]
    Liu XT, Guo ZQ, Liu SR, Zhang P, Lu HM, Zhou YM. Comparing software defect prediction models: Research problem, progress, and challenges. Ruan Jian Xue Bao/Journal of Software, 2023, 34(2): 582-624(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6714.htm [doi: 10.13328/j.cnki.jos.006714]
    [18] 陈翔, 鞠小林, 文万志, 顾庆. 基于程序频谱的动态缺陷定位方法研究. 软件学报, 2015, 26(2): 390-412. http://www.jos.org.cn/1000-9825/4708.htm [doi: 10.13328/j.cnki.jos.004708]Chen X, Ju XL, Wen WZ, Gu Q. Review of dynamic fault localization approaches based on program spectrum. Ruan Jian Xue Bao/ Journal of Software, 2015, 26(2): 390-412(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4708.htm [doi: 10.13328/j.cnki.jos.004708]
    [19] Zhou J, Zhang H, Lo D. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proc. of the 34th Int'l Conf. on Software Engineering (ICSE). Zurich: IEEE, 2012. 14-24.
    [20] Davies S, Roper M, Wood M. Using bug report similarity to enhance bug localisation. In: Proc. of the 19th Working Conf. on Reverse Engineering. IEEE, 2012. 125-134. [doi: 10.1109/wcre.2012.22]
    [21] Wang S, Lo D. Version history, similar report, and structure: Putting them together for improved bug localization. In: Proc. of the 22nd Int'l Conf. on Program Comprehension. 2014. 53-63. [doi: 10.1145/2597008.2597148]
    [22] Kılınç D, Yücalar F, Borandağ E, Aslan E. Multi-level reranking approach for bug localization. Expert System: The Journal of Knowledge Engineering, 2016, 33(3): 286-294. [doi: 10.1111/exsy.12150]
    [23] Ye X, Bunescu R, Liu C. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans. on Software Engineering, 2015, 42(4): 379-402. [doi: 10.1109/tse.2015.2479232]
    [24] Huo X, Li M, Zhou ZH. Learning unified features from natural and programming languages for locating buggy source code. In: Proc. of the 25th Int'l Conf. on Artificial General Intelligence. 2016. 1606-1612.
    [25] Beel J, Langer S, Gipp B. TF-iDuF: A novel term-weighting scheme for user modeling based on users' personal document collections. In: Proc. of the iConference 2017. 2017.
    [26] Xiao Y, Keung J, Bennin KE, Mi Q. Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology, 2019, 105(1): 17-29. [doi: 10.1016/j.infsof.2018.08.002]
    [27] Liang H, Sun L, Wang M, Yang Y. Deep learning with customized abstract syntax tree for bug localization. IEEE Access, 2019, 7: 116309-116320. [doi: 10.1109/access.2019.2936948]
    [28] Rnman S, Ganguly KK, Sakib K. An improved bug localization using structured information retrieval and version history. In: Proc. of the Int'l Conf. on Computer and Information Technology. 2015. 190-195. [doi: 10.1109/ICCITechn.2015.7488066]
    [29] Wang SW, Lo D, Lawall J. Compositional vector space models for improved bug localization. In: Proc. of the Int'l Conf. on Software Maintenance and Evolution. 2014. 171-180. [doi: 10.1109/ICSME.2014.39]
    [30] Youm KC, Ahn J, Lee E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82(12): 177-192. [doi: 10.1016/j.infsof.2016.11.002]
    [31] Storn R, Price K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 1997, 11(4): 341-359. [doi: 10.1023/a:1008202821328]
    [32] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2018. [doi: 10.2196/preprints.40992]
    [33] Feng ZY, Guo DY, Tang DY, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M. CodeBERT: A pre-trained model for programming and natural languages. In: Proc. of the Findings of the Association for Computational Linguistics (EMNLP 2020). 2020. 1536-1547. [doi: 10.18653/v1/2020.findings-emnlp.139]
    [34] Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019. [doi: 10.2196/preprints.35606]
    [35] Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong LC, Jain V, Liu XB, Shah H. Wide & deep learning for recommender systems. In: Proc. of the 1st Workshop on Deep Learning for Recommender Systems. 2016. 7-10.
    [36] Nagel E. The Structure of Science: Problems in the Logic of Scientific Explanation. New York: Harcourt, Brace and World, 1961. 90.
    [37] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117. [doi: 10.1016/s0169-7552(98)00110-x]
    [38] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. [doi: 10.1063/pt.5.028530]
    [39] Lee J, Kim D, Bissyandé TF, Jung W, Traon YL. Bench4BL: Reproducibility study of the performance of IR-based bug localization. In: Proc. of the 27th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. 2018. 1-12. [doi: 10.1145/3213846.3213856]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

申宗汶,牛菲菲,李传艺,陈翔,李奇,葛季栋,骆斌.融合信息检索和深度模型特征的软件缺陷定位方法.软件学报,2024,35(7):3245-3264

复制
分享
文章指标
  • 点击次数:804
  • 下载次数: 3056
  • HTML阅读次数: 937
  • 引用次数: 0
历史
  • 收稿日期:2023-09-11
  • 最后修改日期:2023-10-30
  • 在线发布日期: 2024-01-05
  • 出版日期: 2024-07-06
文章二维码
您是第19894368位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号