大模型在软件缺陷检测与修复的应用发展综述

doi:10.13328/j.cnki.jos.007268

微信服务号

微信订阅号

2025年5月1日 7:23 星期四

首页 > 过刊浏览>2025年第36卷第4期 >1489-1529. DOI:10.13328/j.cnki.jos.007268

PDF HTML阅读 XML下载导出引用引用提醒

大模型在软件缺陷检测与修复的应用发展综述
DOI:
                        10.13328/j.cnki.jos.007268
                    
CSTR:
                        32375.14.jos.007268
                    
作者:
                        香佳宏香佳宏
南方科技大学 斯发基斯可信自主系统研究院, 广东 深圳 518055;南方科技大学 计算机科学与工程系, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找
徐霄阳徐霄阳
南方科技大学 斯发基斯可信自主系统研究院, 广东 深圳 518055;南方科技大学 计算机科学与工程系, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找
孔繁初孔繁初
南方科技大学 斯发基斯可信自主系统研究院, 广东 深圳 518055;南方科技大学 计算机科学与工程系, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找
彭湃彭湃
深圳艾提亚科技有限公司, 广东 深圳 518067
在期刊界中查找
在百度中查找
在本站中查找
张钊张钊
深圳艾提亚科技有限公司, 广东 深圳 518067
在期刊界中查找
在百度中查找
在本站中查找
张煜群张煜群
南方科技大学 斯发基斯可信自主系统研究院, 广东 深圳 518055;南方科技大学 计算机科学与工程系, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金 (62372220)

Survey on Application and Development of Large Language Models in Software Defect Detection and Repair

Author:

XIANG Jia-Hong
XIANG Jia-Hong
Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China;Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找
XU Xiao-Yang
XU Xiao-Yang
Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China;Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找
KONG Fan-Chu
KONG Fan-Chu
Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China;Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找
PENG Pai
PENG Pai
ITEA Technologies Co. Ltd., Shenzhen 518067, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Zhao
ZHANG Zhao
ITEA Technologies Co. Ltd., Shenzhen 518067, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yu-Qun
ZHANG Yu-Qun
Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China;Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [203]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着信息化的深入, 大量应用程序的开发和功能迭代不可避免引入软件缺陷, 并潜在地对程序可靠性和安全性造成了严重的威胁. 检测与修复软件漏洞, 已经成为开发者维护软件质量必要的任务, 同时也是沉重的负担. 对此, 软件工程的研究者在过去的数十年中提出大量相关技术, 帮助开发者解决缺陷相关问题. 然而这些技术都面对着一些严峻的挑战, 在工业实践落地上鲜有进展. 大模型, 如代码大模型CodeX和对话大模型ChatGPT, 通过在海量数据集上进行训练, 能够捕捉代码中的复杂模式和结构, 处理大量上下文信息并灵活地适应各种任务, 以其优秀的性能吸引了大量研究人员的关注. 在诸多软件工程任务中, 基于大模型的技术展现出显著的优势, 有望解决不同领域过去所面对的关键挑战. 因此, 尝试对目前已经存在基于大模型相关成熟技术的3个缺陷检测领域: 深度学习库的缺陷检测、GUI自动化测试、测试用例的自动生成, 与软件缺陷修复的成熟领域: 缺陷自动化修复, 进行分析和探究, 在阐述其发展脉络的同时对不同技术流派的特性和挑战进行深入的探讨. 最后, 基于对已有研究的分析, 总结这些领域和技术所面临的关键挑战及对未来研究的启示.

关键词:大模型;缺陷检测;深度学习库缺陷检测;测试用例自动生成;GUI自动化测试;缺陷自动修复

Abstract:

With the advancement of informationalization, the development of a variety of applications and iterative functions inevitably leads to software defects, posing significant threats to program reliability and security. Therefore, detecting and repairing software defects becomes essential yet onerous for developers in maintaining software quality. Accordingly, software engineering researchers have proposed numerous technologies over the past decades to help developers address defect-related issues. However, these technologies face serious challenges and make little progress in industrial implementation. Large language model (LLM), such as the code-based model CodeX and the prestigious ChatGPT, trained on massive datasets, can capture complex patterns and structures in code, process extensive contextual information, and flexibly adapt to various tasks. Their superior performance has attracted considerable attention from researchers. In many software engineering tasks, technologies based on LLM show significant advantages in addressing key challenges previously faced in different domains. Consequently, this study attempts to analyze and explore three defect detection domains where technologies based on LLM have been widely adopted: deep-learning library defect detection, GUI automated testing, and automated test case generation, along with one mature software defect repair domain: automated program repair (APR). This study delves into the progress of these domains and provides an in-depth discussion of their characteristics and challenges. Lastly, based on an analysis of existing research, this study summarizes the key challenges faced by these domains and technologies and offers insights for future research.

Key words:large language model (LLM);defect detection;deep-learning library defect testing;automated test case generation;automated GUI testing;automated program repair

参考文献

[1] Gazzola L, Micucci D, Mariani L. Automatic software repair: A survey. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018. 1219. [doi: 10.1145/3180155.3182526]

[2] Monperrus M. Automatic software repair: A bibliography. ACM Computing Surveys, 2018, 51(1): 17.

[3] Ayewah N, Pugh W, Hovemeyer D, Morgenthaler JD, Penix J. Using static analysis to find bugs. IEEE Software, 2008, 25(5): 22–29.

[4] Cole B, Hakim D, Hovemeyer D, Lazarus R, Pugh W, Stephens K. Improving your software using static analysis to find bugs. In: Proc. of the 21st ACM SIGPLAN Symp. on Object-oriented Programming Systems and Applications. Portland: ACM, 2006. 673–674. [doi: 10.1145/1176617.1176667]

[5] Pacheco C, Ernst MD. Randoop: Feedback-directed random testing for Java. In: Proc. of the 22nd ACM SIGPLAN Conf. on Object-oriented Programming Systems and Applications Companion. Montreal: ACM, 2007. 815–816. [doi: 10.1145/1297846.1297902]

[6] Fraser G, Arcuri A. EvoSuite: Automatic test suite generation for object-oriented software. In: Proc. of the 19th ACM SIGSOFT Symp. and the 13th European Conf. on Foundations of Software Engineering. Szeged: ACM, 2011. 416–419. [doi: 10.1145/2025113.2025179]

[7] DeMarco F, Xuan JF, Le Berre D, Monperrus M. Automatic repair of buggy if conditions and missing preconditions with SMT. In: Proc. of the 6th Int’l Workshop on Constraints in Software Testing, Verification, and Analysis. Hyderabad: ACM, 2014. 30–39. [doi: 10.1145/2593735.2593740]

[8] Chen LS, Pei Y, Furia CA. Contract-based program repair without the contracts. In: Proc. of the 32nd IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Urbana: IEEE, 2017. 637–647. [doi: 10.1109/ASE.2017.8115674]

[9] Wu YH, Jiang AQ, Li WD, Rabe MN, Staats C, Jamnik M, Szegedy C. Autoformalization with large language models. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 32353–32368.

[10] First E, Rabe MN, Ringer T, Brun Y. Baldur: Whole-proof generation and repair with large language models. arXiv:2303.04910, 2023.

[11] Kim D, Nam J, Song J, Kim S. Automatic patch generation learned from human-written patches. In: Proc. of the 35th Int’l Conf. on Software Engineering (ICSE). San Francisco: IEEE, 2013. 802–811. [doi: 10.1109/ICSE.2013.6606626]

[12] Hua JR, Zhang MS, Wang KY, Khurshid S. Towards practical program repair with on-demand candidate generation. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018. 12–23.

[13] Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018. https://hayate-lab.com/wp-content/uploads/2023/05/43372bfa750340059ad87ac8e538c53b.pdf

[14] Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. 2019. https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf

[15] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. arXiv:1810.04805, 2019.

[16] Liu YH, Ott M, Goyal N, Du JF, Joshi M, Chen DQ, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019.

[17] ICSE 2023. 2023. https://conf.researchr.org/home/icse-2023

[18] ISSTA 2023. 2023. https://conf.researchr.org/home/issta-2023

[19] ASE 2023. 2023. https://conf.researchr.org/home/ase-2023

[20] ESEC/FSE 2023. 2023. https://conf.researchr.org/home/fse-2023

[21] Deng YL, Xia CS, Yang CY, Zhang SD, Yang SJ, Zhang LM. Large language models are edge-case fuzzers: Testing deep learning libraries via FuzzGPT. arXiv:2304.02014, 2023.

[22] Kang S, Yoon J, Yoo S. Large language models are few-shot testers: Exploring LLM-based general bug reproduction. In: Proc. of the 45th Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 2312–2323. [doi: 10.1109/ICSE48619.2023.00194]

[23] Liu Z, Chen CY, Wang JJ, Chen MZ, Wu BY, Che X, Wang DD, Wang Q. Make LLM a testing expert: Bringing human-like interaction to mobile GUI testing via functionality-aware decisions. arXiv:2310.15780, 2023.

[24] Xia CS, Ding YF, Zhang LM. Revisiting the plastic surgery hypothesis via large language models. arXiv:2303.10494, 2023.

[25] Hou XY, Zhao YJ, Liu Y, Yang Z, Wang KL, Li L, Luo XP, Lo D, Grundy J, Wang HY. Large language models for software engineering: A systematic literature review. arXiv:2308.10620, 2024.

[26] Pan SR, Luo LH, Wang YF, Chen C, Wang JP, Wu XD. Unifying large language models and knowledge graphs: A roadmap. IEEE Trans. on Knowledge and Data Engineering, 2024, 36(7): 3580–3599.

[27] Yang CY, Deng YL, Lu RY, Yao JY, Liu JW, Jabbarvand R, Zhang LM. WhiteFox: White-box compiler fuzzing empowered by large language models. arXiv:2310.15991, 2024.

[28] Sun ML, Yang YB, Wang Y, Wen M, Jia HX, Zhou YM. SMT solver validation empowered by large pre-trained language models. In: Proc. of the 38th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Luxembourg: IEEE, 2023. 1288–1300.

[29] Xia CS, Paltenghi M, Le Tian J, Pradel M, Zhang LM. Fuzz4All: Universal fuzzing with large language models. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: ACM, 2024. 126.

[30] Wohlin C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proc. of the 18th Int’l Conf. on Evaluation and Assessment in Software Engineering. London: ACM, 2014. 38.

[31] 姜佳君, 陈俊洁, 熊英飞. 软件缺陷自动修复技术综述. 软件学报, 2021, 32(9): 2665–2690. http://www.jos.org.cn/1000-9825/6274.htm

Jiang JJ, Chen JJ, Xiong YF. Survey of automatic program repair techniques. Ruan Jian Xue Bao/Journal of Software, 2021, 32(9): 2665–2690 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6274.htm

[32] Xia CS, Zhang LM. Less training, more repairing please: Revisiting automated program repair via zero-shot learning. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 959–971. [doi: 10.1145/3540250.3549101]

[33] OpenAI platform. 2023. https://platform.openai.com

[34] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: Open and efficient foundation language models. arXiv:2302.13971, 2023.

[35] Chowdhery A, Narang S, Devlin J, et al. PaLM: Scaling language modeling with pathways. arXiv:2204.02311, 2022.

[36] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.

[37] Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models. arXiv:2203.15556, 2022.

[38] Shanahan M. Talking about large language models. arXiv:2212.03551, 2023.

[39] Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv:2107.03374, 2021.

[40] OpenAI. Introducing ChatGPT. 2023. https://openai.com/blog/chatgpt

[41] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou YQ, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text Transformer. The Journal of Machine Learning Research, 2020, 21(1): 5485–5551.

[42] Lewis M, Liu YH, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. 7871–7880.

[43] Wang WH, Zhang YQ, Sui Y, Wan Y, Zhao Z, Wu J, Yu PS, Xu GD. Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Trans. on Software Engineering, 2022, 48(1): 102–119.

[44] Zeng ZR, Tan HZ, Zhang HT, Li J, Zhang YQ, Zhang LM. An extensive study on pre-trained models for program understanding and generation. In: Proc. of the 31st ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2022. 39–51. [doi: 10.1145/3533767.3534390]

[45] Huang K, Meng XY, Zhang J, Liu Y, Wang WJ, Li SH, Zhang YQ. An empirical study on fine-tuning large language models of code for automated program repair. In: Proc. of the 38th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Luxembourg: IEEE, 2023. 1162–1174. [doi: 10.1109/ASE56229.2023.00181]

[46] Nashid N, Sintaha M, Mesbah A. Retrieval-based prompt selection for code-related few-shot learning. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 2450–2462. [doi: 10.1109/ICSE48619.2023.00205]

[47] Yuan ZQ, Lou YL, Liu MW, Ding SJ, Wang KX, Chen YX, Peng X. No more manual tests? Evaluating and improving ChatGPT for unit test generation. arXiv:2305.04207, 2024.

[48] Wei J, Wang XZ, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E, Le QV, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. arXiv:2201.11903, 2023.

[49] Fakhoury S, Chakraborty S, Musuvathi M, Lahiri SK. Towards generating functionally correct code edits from natural language issue descriptions. arXiv:2304.03816, 2023.

[50] Deng YL, Xia CS, Peng HR, Yang CY, Zhang LM. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023. 423–435. [doi: 10.1145/3597926.3598067]

[51] Prenner JA, Robbes R. Automatic program repair with OpenAI’s codex: Evaluating QuixBugs. arXiv:2111.03922, 2021.

[52] Xia CS, Wei YX, Zhang LM. Practical program repair in the era of large pre-trained language models. arXiv:2210.14179, 2022.

[53] Döderlein JB, Acher M, Khelladi DE, Combemale B. Piloting copilot and codex: Hot temperature, cold prompts, or black magic? arXiv:2210.14699, 2023.

[54] Pudari R, Ernst NA. From copilot to pilot: Towards ai supported software development. arXiv:2303.04142, 2023.

[55] Ziegler DM, Stiennon N, Wu J, Brown TB, Radford A, Amodei D, Christiano P, Irving G. Fine-tuning language models from human preferences. arXiv:1909.08593, 2020.

[56] Sobania D, Briesch M, Hanna C, Petke J. An analysis of the automatic bug fixing performance of ChatGPT. arXiv:2301.08653, 2023.

[57] Xia CS, Zhang LM. Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv:2304.00385, 2023.

[58] Wang JJ, Huang YC, Chen CY, Liu Z, Wang S, Wang Q. Software testing with large language models: Survey, landscape, and vision. arXiv:2307.07221, 2024.

[59] Sun Y, Chen YH, Wang XG, Tang XO. Deep learning face representation by joint identification-verification. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems—Vol. 2. Montreal: MIT Press, 2014. 1988–1996.

[60] Julian KD, Lopez J, Brush JS, Owen MP, Kochenderfer MJ. Policy compression for aircraft collision avoidance systems. In: Proc. of the 35th IEEE/AIAA Digital Avionics Systems Conf. (DASC). Sacramento: IEEE, 2016. 1–10. [doi: 10.1109/DASC.2016.7778091]

[61] Liu SQ, Liu SD, Cai WD, Pujol S, Kikinis R, Feng DG. Early diagnosis of Alzheimer’s disease with deep learning. In: Proc. of the 11th IEEE Int’l Symp. on Biomedical Imaging (ISBI). Beijing: IEEE, 2014. 1015–1018. [doi: 10.1109/ISBI.2014.6868045]

[62] Chen CY, Seff A, Kornhauser A, Xiao JX. DeepDriving: Learning affordance for direct perception in autonomous driving. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision (ICCV). Santiago: IEEE, 2015. 2722–2730. [doi: 10.1109/ICCV.2015.312]

[63] ABC7.com. Uber gives up testing of self-driving cars in California in wake of fatal Arizona crash. 2018. https://abc7.com/self-driving-uber-crash-video-pedestrian-hit-by-car-autonomous-vehicles/3269690/

[64] Self-driving car. Wikipedia, 2023. https://en.wikipedia.org/wiki/Self-driving_car

[65] Pham HV, Lutellier T, Qi WZ, Tan L. CRADLE: Cross-backend validation to detect and localize bugs in deep learning libraries. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Montreal: IEEE, 2019. 1027–1038. [doi: 10.1109/ICSE.2019.00107]

[66] Wang Z, Yan M, Chen JJ, Liu S, Zhang DD. Deep learning library testing via effective model generation. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. ACM, 2020. 788–799. [doi: 10.1145/3368089.3409761]

[67] Wei AJ, Deng YL, Yang CY, Zhang LM. Free lunch for testing: Fuzzing deep-learning libraries from open source. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 995–1007.

[68] Guo QY, Xie XF, Li Y, Zhang XY, Liu Y, Li XH, Shen C. Audee: Automated testing for deep learning frameworks. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. ACM, 2021. 486–498. [doi: 10.1145/3324884.3416571]

[69] Gu JZ, Luo XC, Zhou YF, Wang X. Muffin: Testing deep learning libraries via neural architecture fuzzing. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1418–1430. [doi: 10.1145/3510003.3510092]

[70] Xie DN, Li YT, Kim M, Pham HV, Tan L, Zhang XY, Godfrey MW. DocTer: Documentation-guided fuzzing for testing deep learning API functions. In: Proc. of the 31st ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2022. 176–188. [doi: 10.1145/3533767.3534220]

[71] Deng YL, Yang CY, Wei AJ, Zhang LM. Fuzzing deep-learning libraries via automated relational API inference. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 44–56. [doi: 10.1145/3540250.3549085]

[72] APP store (apple). Wikipedia, 2023. https://en.wikipedia.org/wiki/App_Store_(Apple)

[73] Google play. Wikipedia, 2023. https://en.wikipedia.org/wiki/Google_Play

[74] 杨艺, 王嬉, 赵春蕾, 步志亮. Android GUI自动化测试综述. 计算机科学, 2022, 49(S2): 756–765.

Yang Y, Wang X, Zhao CL, Bu ZL. Overview of Android GUI automated testing. Computer Science, 2022, 49(S2): 756–765 (in Chinese with English abstract).

[75] Yu SC, Fang CR, Tuo ZY, Zhang QJ, Chen CY, Chen ZY, Su ZD. Vision-based mobile APP GUI testing: A survey. arXiv:2310.13518, 2023.

[76] UI/application exerciser monkey, Android studio. Android developers. 2023. https://developer.android.com/studio/test/other-testing-tools/monkey

[77] Machiry A, Tahiliani R, Naik M. Dynodroid: An input generation system for Android APPs. In: Proc. of the 9th Joint Meeting on Foundations of Software Engineering. Saint Petersburg: ACM, 2013. 224–234.

[78] Hao S, Liu B, Nath S, Halfond WGJ, Govindan R. PUMA: Programmable UI-automation for large-scale dynamic analysis of mobile APPs. In: Proc. of the 12th Annual Int’l Conf. on Mobile Systems, Applications, and Services. Bretton: ACM, 2014. 204–217. [doi: 10.1145/2594368.2594390]

[79] Yang W, Prasad MR, Xie T. A grey-box approach for automated GUI-model generation of mobile applications. In: Proc. of the 16th Int’l Conf. on Fundamental Approaches to Software Engineering. Rome: Springer, 2013. 250–265. [doi: 10.1007/978-3-642-37057-1_19]

[80] Choi W, Necula G, Sen K. Guided GUI testing of Android APPs with minimal restart and approximate learning. In: Proc. of the 2013 ACM SIGPLAN Int’l Conf. on Object Oriented Programming Systems Languages & Applications. Indianapolis: ACM, 2013. 623–640. [doi: 10.1145/2509136.2509552]

[81] Mirzaei N, Garcia J, Bagheri H, Sadeghi A, Malek S. Reducing combinatorics in GUI testing of Android applications. In: Proc. of the 38th Int’l Conf. on Software Engineering. Austin: ACM, 2016. 559–570.

[82] Li YC, Yang ZY, Guo Y, Chen XQ. DroidBot: A lightweight UI-guided test input generator for Android. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering Companion (ICSE-C). Buenos Aires: IEEE, 2017. 23–26. [doi: 10.1109/ICSE-C.2017.8]

[83] Su T, Meng GZ, Chen YT, Wu K, Yang WM, Yao Y, Pu GG, Liu Y, Su ZD. Guided, stochastic model-based GUI testing of Android APPs. In: Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn: ACM, 2017. 245–256. [doi: 10.1145/3106237.3106298]

[84] Gibbs sampling. Wikipedia, 2023. https://en.wikipedia.org/wiki/Gibbs_sampling

[85] Gu TX, Sun CN, Ma XX, Cao C, Xu C, Yao Y, Zhang QR, Lu J, Su ZD. Practical GUI testing of Android applications via model abstraction and refinement. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Montreal: IEEE, 2019. 269–280. [doi: 10.1109/ICSE.2019.00042]

[86] Cai TQ, Zhang Z, Yang P. Fastbot: A multi-agent model-based test generation system. In: Proc. of the 1st IEEE/ACM Int’l Conf. on Automation of Software Test. Seoul: ACM, 2020. 93–96. [doi: 10.1145/3387903.3389308]

[87] Wang J, Jiang YY, Xu C, Cao C, Ma XX, Lu J. ComboDroid: Generating high-quality test inputs for Android APPs via use case combinations. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 469–480. [doi: 10.1145/3377811.3380382]

[88] Liu Z, Chen CY, Wang JJ, Huang YK, Hu J, Wang Q. Guided bug crush: Assist manual GUI testing of Android APPs via hint moves. In: Proc. of the 2022 CHI Conf. on Human Factors in Computing Systems. New Orleans: ACM, 2022. 557. [doi: 10.1145/3491102.3501903]

[89] Anand S, Naik M, Harrold MJ, Yang H. Automated concolic testing of smartphone APPs. In: Proc. of the 20th ACM SIGSOFT Int’l Symp. on the Foundations of Software Engineering. Cary: ACM, 2012. 59.

[90] Amalfitano D, Fasolino AR, Tramontana P, De Carmine S, Memon AM. Using GUI ripping for automated testing of Android applications. In: Proc. of the 27th IEEE/ACM Int’l Conf. on Automated Software Engineering. Essen: ACM, 2012. 258–261. [doi: 10.1145/2351676.2351717]

[91] Azim T, Neamtiu I. Targeted and depth-first exploration for systematic testing of Android APPs. In: Proc. of the 2013 ACM SIGPLAN Int’l Conf. on Object Oriented Programming Systems Languages & Applications. Indianapolis: ACM, 2013. 641–660.

[92] Mahmood R, Mirzaei N, Malek S. EvoDroid: Segmented evolutionary testing of Android APPs. In: Proc. of the 22nd ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Hong Kong: ACM, 2014. 599–609. [doi: 10.1145/2635868.2635896]

[93] Mao K, Harman M, Jia Y. Sapienz: Multi-objective automated testing for Android applications. In: Proc. of the 25th Int’l Symp. on Software Testing and Analysis. Saarbrücken: ACM, 2016. 94–105. [doi: 10.1145/2931037.2931054]

[94] Dong Z, Böhme M, Cojocaru L, Roychoudhury A. Time-travel testing of Android APPs. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 481–492. [doi: 10.1145/3377811.3380402]

[95] Li YC, Yang ZY, Guo Y, Chen XQ. Humanoid: A deep learning-based approach to automated black-box Android APP testing. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 1070–1073.

[96] Pan MX, Huang A, Wang GX, Zhang T, Li XD. Reinforcement learning based curiosity-driven testing of Android applications. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 153–164. [doi: 10.1145/3395363.3397354]

[97] Peng C, Zhang Z, Lv ZW, Yang P. MUBot: Learning to test large-scale commercial Android APPs like a human. In: Proc. of the 2022 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Limassol: IEEE, 2022. 543–552.

[98] Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY. Multimodal deep learning. In: Proc. of the 28th Int’l Conf. on Machine Learning. Bellevue: Omnipress, 2011. 689–696.

[99] YazdaniBanafsheDaragh F, Malek S. Deep GUI: Black-box GUI input generation with deep learning. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 905–916. [doi: 10.1109/ASE51524.2021.9678778]

[100] Liu Z, Chen CY, Wang JJ, Chen MZ, Wu BY, Che X, Wang DD, Wang Q. Chatting with GPT-3 for zero-shot human-like mobile automated GUI testing. arXiv:2305.09434, 2023.

[101] Liu P, Zhang XY, Pistoia M, Zheng YH, Marques M, Zeng LF. Automatic text input generation for mobile testing. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 643–653. [doi: 10.1109/ICSE.2017.65]

[102] Liu Z, Chen CY, Wang JJ, Che X, Huang YK, Hu J, Wang Q. Fill in the blank: Context-aware automated text input generation for mobile GUI testing. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Melbourne: IEEE, 2023. 1355–1367. [doi: 10.1109/ICSE48619.2023.00119]

[103] Wu TY, Deng X, Yan J, Zhang J. Analyses for specific defects in Android applications: A survey. Frontiers of Computer Science, 2019, 13(6): 1210–1227.

[104] Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1877–1901.

[105] Liu Z, Chen CY, Wang JJ, Chen MZ, Wu BY, Che X, Wang DD, Wang Q. Testing the limits: Unusual text inputs generation for mobile APP crash detection with large language model. arXiv:2310.15657, 2023.

[106] Linares-Vásquez M, Bernal-Cardenas C, Moran K, Poshyvanyk D. How do developers test Android applications? In: Proc. of the 2017 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Shanghai: IEEE, 2017. 613–622. [doi: 10.1109/ICSME.2017.47]

[107] Yoon J, Feldt R, Yoo S. Autonomous large language model agents enabling intent-driven mobile GUI testing. arXiv:2311.08649, 2023.

[108] Nass M, Alegroth E, Feldt R. Improving Web element localization by using a large language model. arXiv:2310.02046, 2023.

[109] Wen H, Wang HM, Liu JX, Li YC. DroidBot-GPT: GPT-powered UI automation for Android. arXiv:2304.07061, 2024.

[110] Almasi MM, Hemmati H, Fraser G, Arcuri A, Benefelds J. An industrial evaluation of unit test generation: Finding real faults in a financial application. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering: Software Engineering in Practice Track. Buenos Aires: IEEE, 2017. 263–272. [doi: 10.1109/ICSE-SEIP.2017.27]

[111] Shore J, Warden S. The Art of Agile Development. 2nd ed., Sebastopol: O’Reilly Media, 2021.

[112] Beck K. Extreme Programming Explained: Embrace Change. Boston: Addison-Wesley Longman Publishing Co. Inc., 1999.

[113] Daka E, Fraser G. A survey on unit testing practices and problems. In: Proc. of the 25th IEEE Int’l Symp. on Software Reliability Engineering. Naples: IEEE, 2014. 201–211. [doi: 10.1109/ISSRE.2014.11]

[114] Daka E, Campos J, Fraser G, Dorn J, Weimer W. Modeling readability to improve unit tests. In: Proc. of the 10th Joint Meeting on Foundations of Software Engineering. Bergamo: ACM, 2015. 107–118.

[115] Lin Y, Ong YS, Sun J, Fraser G, Dong JS. Graph-based seed object synthesis for search-based unit testing. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 1068–1080. [doi: 10.1145/3468264.3468619]

[116] Fraser G, Arcuri A. Sound empirical evidence in software testing. In: Proc. of the 34th Int’l Conf. on Software Engineering. Zurich: IEEE, 2012. 178–188. [doi: 10.1109/ICSE.2012.6227195]

[117] Ernst MD. Randoop: Automatic unit test generation for Java. 2023. https://randoop.github.io/randoop/

[118] Selakovic M, Pradel M, Karim R, Tip F. Test generation for higher-order functions in dynamic languages. Proc. of the ACM on Programming Languages, 2018, 2: 161.

[119] Arteca E, Harner S, Pradel M, Tip F. Nessie: Automatically testing JavaScript APIs with asynchronous callbacks. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1494–1505.

[120] Ernst MD, Perkins JH, Guo PJ, McCamant S, Pacheco C, Tschantz MS, Xiao C. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 2007, 69(1–3): 35–45.

[121] Csallner C, Tillmann N, Smaragdakis Y. DySy: Dynamic symbolic execution for invariant inference. In: Proc. of the 30th Int’l Conf. on Software Engineering. Leipzig: ACM, 2008. 281–290. [doi: 10.1145/1368088.1368127]

[122] Molina F, Ponzio P, Aguirre N, Frias M. EvoSpex: An evolutionary algorithm for learning postconditions. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Madrid: IEEE, 2021. 1223–1235. [doi: 10.1109/ICSE43902.2021.00112]

[123] Palomba F, Di Nucci D, Panichella A, Oliveto R, De Lucia A. On the diffusion of test smells in automatically generated test code: An empirical study. In: Proc. of the 9th Int’l Workshop on Search-based Software Testing. Austin: ACM, 2016. 5–14.

[124] Watson C, Tufano M, Moran K, Bavota G, Poshyvanyk D. On learning meaningful assert statements for unit test cases. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 1398–1409. [doi: 10.1145/3377811.3380429]

[125] Mastropaolo A, Scalabrino S, Cooper N, Palacio DN, Poshyvanyk D, Oliveto R, Bavota G. Studying the usage of text-to-text transfer Transformer to support code-related tasks. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering. Madrid: IEEE, 2021. 336–347. [doi: 10.1109/ICSE43902.2021.00041]

[126] Mastropaolo A, Cooper N, Palacio DN, Scalabrino S, Poshyvanyk D, Oliveto R, Bavota G. Using transfer learning for code-related tasks. IEEE Trans. on Software Engineering, 2023, 49(4): 1580–1598.

[127] Yu H, Lou YL, Sun K, Ran DZ, Xie T, Hao D, Li Y, Li G, Wang QX. Automated assertion generation via information retrieval and its integration with deep learning. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 163–174. [doi: 10.1145/3510003.3510149]

[128] Nie PY, Banerjee R, Li JJ, Mooney RJ, Gligoric M. Learning deep semantics for test completion. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 2111–2123. [doi: 10.1109/ICSE48619.2023.00178]

[129] Tufano M, Drain D, Svyatkovskiy A, Sundaresan N. Generating accurate assert statements for unit test cases using pretrained Transformers. In: Proc. of the 3rd ACM/IEEE Int’l Conf. on Automation of Software Test. Pittsburgh: ACM, 2022. 54–64. [doi: 10.1145/3524481.3527220]

[130] Dinella E, Ryan G, Mytkowicz T, Lahiri SK. TOGA: A neural method for test oracle generation. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 2130–2141. [doi: 10.1145/3510003.3510141]

[131] Liu ZX, Liu K, Xia X, Yang XH. Towards more realistic evaluation for neural test oracle generation. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023. 589–600.

[132] Tufano M, Drain D, Svyatkovskiy A, Deng SK, Sundaresan N. Unit test case generation with Transformers and focal context. aarXiv:2009.05617, 2021.

[133] Panichella A, Panichella S, Fraser G, Sawant AA, Hellendoorn VJ. Revisiting test smells in automatically generated tests: Limitations, pitfalls, and opportunities. In: Proc. of the 2020 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Adelaide: IEEE, 2020. 523–533. [doi: 10.1109/ICSME46990.2020.00056]

[134] Lemieux C, Inala JP, Lahiri SK, Sen S. CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 919–931.

[135] Schäfer M, Nadi S, Eghbali A, Tip F. An empirical evaluation of using large language models for automated unit test generation. arXiv:2302.06527, 2023.

[136] Xie ZK, Chen YH, Zhi C, Deng SG, Yin JW. ChatUniTest: A ChatGPT-based automated unit test generation tool. arXiv:2305.04764, 2024.

[137] Chen B, Zhang FJ, Nguyen A, Zan DG, Lin ZQ, Lou JG, Chen WZ. CodeT: Code generation with generated tests. arXiv:2207.10397, 2022.

[138] Lahiri SK, Fakhoury S, Naik A, Sakkas G, Chakraborty S, Musuvathi M, Choudhury P, Von Veh C, Inala JP, Wang CL, Gao JF. Interactive code generation via test-driven user-intent formalization. arXiv:2208.05950, 2023.

[139] Mankowitz DJ, Michi A, Zhernov A, et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature, 2023, 618(7964): 257–263.

[140] GPT-4 “discovered” the same sorting algorithm as alphadev by removing “mov s p” | hacker news, 2024. https://news.ycombinator.com/item?id=36247549

[141] The New York Times. A smarter APP is watching your wallet. 2023. https://www.nytimes.com/2021/03/09/business/apps-personal-finance-budget.html

[142] Webster RW, Hess D. A real-time software controller for a digital model railroad system. In: Proc. of the 1993 IEEE Workshop on Real-time Applications. New York: IEEE, 1993. 126–130. [doi: 10.1109/RTA.1993.263102]

[143] Brown D. Hospitals turn to artificial intelligence to help with an age-old problem: Doctors’ poor bedside manners. Washington Post, 2021. https://www.washingtonpost.com/technology/2021/02/16/virtual-ai-hospital-patients

[144] Liblit B, Aiken A, Zheng AX, Jordan MI. Bug isolation via remote program sampling. ACM SIGPLAN Notices, 2003, 38(5): 141–154.

[145] Just R, Jalali D, Ernst MD. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In: Proc. of the 2014 Int’l Symp. on Software Testing and Analysis. San Jose: ACM, 2014. 437–440.

[146] Lin D, Koppel J, Chen A, Solar-Lezama A. QuixBugs: A multi-lingual program repair benchmark set based on the quixey challenge. In: Proc. of the 2017 ACM SIGPLAN Int’l Conf. on Systems, Programming, Languages, and Applications: Software for Humanity. Vancouver: ACM, 2017. 55–56. [doi: 10.1145/3135932.3135941]

[147] Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans. on Software Engineering, 2015, 41(12): 1236–1256.

[148] Mao XG, Lei Y, Dai ZY, Qi YH, Wang CS. Slice-based statistical fault localization. Journal of Systems and Software, 2014, 89: 51–62.

[149] Eric Wong W, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208.

[150] Abreu R, Zoeteweij P, van Gemund AJC. Spectrum-based multiple fault localization. In: Proc. of the 2009 IEEE/ACM Int’l Conf. on Automated Software Engineering. Auckland: IEEE, 2009. 88–99. [doi: 10.1109/ASE.2009.25]

[151] Perez A, Abreu R, van Deursen A. A test-suite diagnosability metric for spectrum-based fault localization approaches. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 654–664. [doi: 10.1109/ICSE.2017.66]

[152] Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI. Scalable statistical bug isolation. ACM SIGPLAN Notices, 2005, 40(6): 15–26.

[153] Liu C, Fei L, Yan XF, Han JW, Midkiff SP. Statistical debugging: A hypothesis testing-based approach. IEEE Trans. on Software Engineering, 2006, 32(10): 831–848.

[154] Jones JA, Harrold MJ. Empirical evaluation of the tarantula automatic fault-localization technique. In: Proc. of the 20th IEEE/ACM Int’l Conf. on Automated Software Engineering. Long Beach: ACM, 2005. 273–282.

[155] Abreu R, Zoeteweij P, Golsteijn R, van Gemund AJC. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11): 1780–1792.

[156] Campos J, Riboira A, Perez A, Abreu R. GZoltar: An eclipse plug-in for testing and debugging. In: Proc. of the 27th IEEE/ACM Int’l Conf. on Automated Software Engineering. Essen: ACM, 2012. 378–381.

[157] Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B. Evaluating and improving fault localization. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 609–620. [doi: 10.1109/ICSE.2017.62]

[158] Steimann F, Frenkel M, Abreu R. Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In: Proc. of the 2013 Int’l Symp. on Software Testing and Analysis. Lugano: ACM, 2013. 314–324. [doi: 10.1145/2483760.2483767]

[159] Xie XY, Chen TY, Kuo FC, Xu BW. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Trans. on Software Engineering and Methodology, 2013, 22(4): 31.

[160] Xuan JF, Monperrus M. Learning to combine multiple ranking metrics for fault localization. In: Proc. of the 2014 IEEE Int’l Conf. on Software Maintenance and Evolution. Victoria: IEEE, 2014. 191–200. [doi: 10.1109/ICSME.2014.41]

[161] Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Le Traon Y. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In: Proc. of the 12th IEEE Conf. on Software Testing, Validation and Verification (ICST). Xi’an: IEEE, 2019. 102–113. [doi: 10.1109/ICST.2019.00020]

[162] Xiong YF, Wang J, Yan RF, Zhang JC, Han S, Huang G, Zhang L. Precise condition synthesis for program repair. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Buenos Aires: IEEE, 2017. 416–426. [doi: 10.1109/ICSE.2017.45]

[163] Zhang XY, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proc. of the 28th Int’l Conf. on Software Engineering. Shanghai: ACM, 2006. 272–281. [doi: 10.1145/1134285.1134324]

[164] Jiang JJ, Xiong YF, Zhang HY, Gao Q, Chen XQ. Shaping program repair space with existing patches and similar code. In: Proc. of the 27th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Amsterdam: ACM, 2018. 298–309. [doi: 10.1145/3213846.3213871]

[165] Xuan JF, Monperrus M. Test case purification for improving fault localization. In: Proc. of the 22nd ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Hong Kong: ACM, 2014. 52–63.

[166] Yang AZH, Le Goues C, Martins R, Hellendoorn V. Large language models for test-free fault localization. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: ACM, 2024. 17. [doi: 10.1145/3597503.3623342]

[167] Wong WE, Gao RZ, Li YH, Abreu R, Wotawa F. A survey on software fault localization. IEEE Trans. on Software Engineering, 2016, 42(8): 707–740.

[168] Zakari A, Lee SP, Abreu R, Ahmed BH, Rasheed RA. Multiple fault localization of software programs: A systematic literature review. Information and Software Technology, 2020, 124: 106312.

[169] De Souza HA, Chaim ML, Kon F. Spectrum-based software fault localization: A survey of techniques, advances, and challenges. arXiv:1607.04347, 2017.

[170] Liu K, Koyuncu A, Kim D, Bissyandé TF. TBar: Revisiting template-based automated program repair. In: Proc. of the 28th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Beijing: ACM, 2019. 31–42.

[171] Xia CS, Wei YX, Zhang LM. Automated program repair in the era of large pre-trained language models. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Melbourne: IEEE, 2023. 1482–1494. [doi: 10.1109/ICSE48619.2023.00129]

[172] Ghanbari A, Benton S, Zhang LM. Practical program repair via bytecode mutation. In: Proc. of the 28th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Beijing: ACM, 2019. 19–30. [doi: 10.1145/3293882.3330559]

[173] 刘斌斌, 董威, 王戟. 智能化的程序搜索与构造方法综述. 软件学报, 2018, 29(8): 2180–2197. http://www.jos.org.cn/1000-9825/5529.htm

Liu BB, Dong W, Wang J. Survey on intelligent search and construction methods of program. Ruan Jian Xue Bao/Journal of Software, 2018, 29(8): 2180–2197 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5529.htm

[174] Le Goues C, Nguyen T, Forrest S, Weimer W. GenProg: A generic method for automatic software repair. IEEE Trans. on Software Engineering, 2012, 38(1): 54–72.

[175] Wen M, Chen JJ, Wu RX, Hao D, Cheung SC. Context-aware patch generation for better automated program repair. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018. 1–11. [doi: 10.1145/3180155.3180233]

[176] Nguyen HDT, Qi DW, Roychoudhury A, Chandra S. SemFix: Program repair via semantic analysis. In: Proc. of the 35th Int’l Conf. on Software Engineering (ICSE). San Francisco: IEEE, 2013. 772–781. [doi: 10.1109/ICSE.2013.6606623]

[177] Jones JA, Harrold MJ, Stasko J. Visualization of test information to assist fault localization. In: Proc. of the 24th Int’l Conf. on Software Engineering. Orlando: ACM, 2002. 467–477. [doi: 10.1145/581339.581397]

[178] Lutellier T, Pham HV, Pang L, Li YT, Wei MS, Tan L. CoCoNuT: Combining context-aware neural translation models using ensemble for program repair. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 101–114. [doi: 10.1145/3395363.3397369]

[179] Jiang N, Lutellier T, Tan L. CURE: Code-aware neural machine translation for automatic program repair. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Madrid: IEEE, 2021. 1161–1173. [doi: 10.1109/ICSE43902.2021.00107]

[180] Zhu QH, Sun ZY, Xiao YA, Zhang WJ, Yuan K, Xiong YF, Zhang L. A syntax-guided edit decoder for neural program repair. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 341–353. [doi: 10.1145/3468264.3468544]

[181] Ye H, Martinez M, Monperrus M. Neural program repair with execution-based backpropagation. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1506–1518. [doi: 10.1145/3510003.3510222]

[182] Jiang N, Lutellier T, Lou YL, Tan L, Goldwasser D, Zhang XY. KNOD: Domain knowledge distilled tree decoder for automated program repair. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 1251–1263. [doi: 10.1109/ICSE48619.2023.00111]

[183] Yue RR, Meng N, Wang QX. A characterization study of repeated bug fixes. In: Proc. of the 2017 IEEE Int’l Conf. on Software Maintenance and Evolution (ICSME). Shanghai: IEEE, 2017. 422–432. [doi: 10.1109/ICSME.2017.16]

[184] Dallmeier V, Zimmermann T. Extraction of bug localization benchmarks from history. In: Proc. of the 22nd IEEE/ACM Int’l Conf. on Automated Software Engineering. Atlanta: ACM, 2007. 433–436. [doi: 10.1145/1321631.1321702]

[185] Jiang JJ, Ren LY, Xiong YF, Zhang LM. Inferring program transformations from singular examples via big code. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 255–266. [doi: 10.1109/ASE.2019.00033]

[186] Jiang YJ, Liu H, Niu N, Zhang L, Hu YM. Extracting concise bug-fixing patches from human-written patches in version control systems. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Madrid: IEEE, 2021. 686–698.

[187] Li Y, Wang SH, Nguyen TN. DLFix: Context-based code transformation learning for automated program repair. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 602–614.

[188] Fan ZY, Gao X, Mirchev M, Roychoudhury A, Tan SH. Automated repair of programs from large language models. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering (ICSE). Melbourne: IEEE, 2023. 1469–1481.

[189] Jiang N, Liu K, Lutellier T, Tan L. Impact of code language models on automated program repair. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 1430–1442. [doi: 10.1109/ICSE48619.2023.00125]

[190] Jia YQ, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In: Proc. of the 22nd ACM Int’l Conf. on Multimedia. Orlando: ACM, 2014. 675–678. [doi: 10.1145/2647868.2654889]

[191] Chen TQ, Li M, Li YT, Lin M, Wang NY, Wang MJ, Xiao TJ, Xu B, Zhang CY, Zhang Z. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274, 2015.

[192] Wiklund K, Eldh S, Sundmark D, Lundqvist K. Impediments for software test automation: A systematic literature review. Software Testing, Verification and Reliability, 2017, 27(8): e1639.

[193] Garcia SE. Usability testing: Creative techniques for answering your research questions. In: Proc. of the Extended Abstracts of the 2020 CHI Conf. on Human Factors in Computing Systems. Honolulu: ACM, 2020. 1–2.

[194] Haas R, Elsner D, Juergens E, Pretschner A, Apel S. How can manual testing processes be optimized? Developer survey, optimization guidelines, and case studies. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 1281–1291.

[195] Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A, Wu YX, Miller A. Language models as knowledge bases? In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: Association for Computational Linguistics, 2019. 2463–2473. [doi: 10.18653/v1/D19-1250]

[196] OpenAI. GPT-4 technical report. arXiv:2303.08774, 2024.

[197] Yang ZY, Li LJ, Lin K, Wang JF, Lin CC, Liu ZC, Wang LJ. The dawn of LMMs: Preliminary explorations with GPT-4V(ision). arXiv:2309.17421, 2023.

[198] ChatGPT plugins. 2023. https://openai.com/blog/chatgpt-plugins

[199] GPT-4 turbo. QpenAI help center. 2023. https://help.openai.com/en/articles/8555510-gpt-4-turbo

[200] Browne R. OpenAI CEO admits a bug allowed some ChatGPT users to see others’ conversation titles. 2023. https://www.cnbc.com/2023/03/23/openai-ceo-says-a-bug-allowed-some-chatgpt-to-see-others-chat-titles.html

引用本文

香佳宏,徐霄阳,孔繁初,彭湃,张钊,张煜群.大模型在软件缺陷检测与修复的应用发展综述.软件学报,2025,36(4):1489-1529

复制

文章指标

点击次数:664
下载次数: 978
HTML阅读次数: 110
引用次数: 0

历史

收稿日期:2023-12-06
最后修改日期:2024-05-18
录用日期:
在线发布日期: 2025-01-08
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码