基于大语言模型的模糊测试研究综述
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

薛吟兴,E-mail:yxxue@ustc.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61972373)


Survey on Fuzzing Based on Large Language Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    模糊测试是一种自动化的软件测试方法,通过向目标软件系统输入大量自动生成的测试数据,以发现系统潜在的安全漏洞、软件缺陷或异常行为.然而,传统模糊测试技术受限于自动化程度低、测试效率低、代码覆盖率低等因素,无法应对现代的大型软件系统.近年来,大语言模型的迅猛发展不仅为自然语言处理领域带来重大突破,也为模糊测试领域带来了新的自动化方案.因此,为了更好地提升模糊测试技术的效果,现有的工作提出了多种结合大语言模型的模糊测试方法,涵盖了测试输入生成、缺陷检测、后模糊处理等模块.但是现有工作缺乏对基于大语言模型的模糊测试技术的系统性调研和梳理讨论,为了填补上述综述方面的空白,本文对现有的基于大语言模型的模糊测试技术的研究发展现状进行了全面的分析和总结.主要内容包括:(1)概述了模糊测试的整体流程和模糊测试研究中常用的大语言模型相关技术;(2)讨论了大模型时代之前的基于深度学习的模糊测试方法的局限性;(3)分析了大语言模型在模糊测试方法中不同环节的应用方式;(4)探讨了大语言模型技术在模糊测试中的主要挑战和今后可能的发展方向.

    Abstract:

    Fuzzing is an automated software testing method that detects potential security vulnerabilities, software defects, or abnormal behavior by inputting a large amount of automatically generated test data into the target software system. However, traditional fuzzing techniques are limited by factors such as low automation, low testing efficiency, and low code coverage, and cannot cope with modern large-scale software systems. In recent years, the rapid development of large language models has not only brought significant breakthroughs to the field of natural language processing, but also brought new automation solutions to the field of fuzzy testing. Therefore, in order to better improve the effectiveness of fuzzing technology, existing work has proposed multiple fuzzing methods that combine large language models, covering modules such as test input generation, defect detection, and post-fuzzing. However, the existing work lacks systematic research and discussion on fuzzing techniques based on large language models. In order to fill the gaps mentioned above, this article comprehensively analyzes and summarizes the current research and development status of fuzzing techniques based on large language models. The main contents include: (1) summarizing the overall process of fuzzing and the relevant technologies related to large language models commonly used in fuzzing research; (2) discussing the limitations of deep learning based fuzzing methods before the era of LLM; (3) analyzing the application methods of large language models in different stages of fuzzing; (4) exploring the main challenges and possible future development directions of large language model technology in fuzzing.

    参考文献
    相似文献
    引证文献
引用本文

李岩,杨文章,张翼,薛吟兴.基于大语言模型的模糊测试研究综述.软件学报,2025,36(6):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-17
  • 最后修改日期:2024-10-14
  • 录用日期:
  • 在线发布日期: 2024-12-10
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号