一种聚类分析驱动种子调度的模糊测试方法
作者:
作者简介:

张文(2001—),男,硕士生,CCF学生会员,主要研究领域为软件安全测试,模糊测试;陈锦富(1978—),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为软件测试,软件安全,可信软件;蔡赛华(1990—),男,博士,讲师,CCF专业会员,主要研究领域为恶意流量检测,异常数据检测,软件安全测试;张翅(1995—),男,博士生,主要研究领域为软件安全测试,模糊测试;刘一松(1966—),男,博士,教授,主要研究领域为人机交互技术,软件工程,可信软件.

通讯作者:

陈锦富,E-mail:jinfuchen@ujs.edu.cn;刘一松,E-mail:liuyisong@ujs.edu.cn

基金项目:

国家自然科学基金(62172194,62202206,U1836116);江苏省自然科学基金(BK20220515,BK20202001);中国博士后科学基金(2023T160275);江苏省研究生科研与实践创新计划(KYCX21_3375,SJCX23_2092);江苏省青蓝工程项目(2022JSDX001)


Fuzzing Approach of Clustering Analysis-driven in Seed Scheduling
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [41]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    作为当前被广泛应用的自动化软件测试技术, 模糊测试的首要目标是尽可能多地探索被测程序的代码区域以达到更高的覆盖率, 从而检测出更多的漏洞或者错误. 现有的模糊测试方法大多是根据种子的历史突变数据来调度种子, 实现起来比较简单, 但忽略了种子所探索程序空间的分布情况, 导致测试工作可能会陷入只对程序的某单一区域进行探测, 造成测试资源的浪费. 提出一种基于聚类分析驱动种子调度的模糊测试方法Cluzz. 首先, Cluzz结合种子执行路径覆盖的分布来分析种子在特征空间上的区别, 使用聚类分析对种子在程序空间中的执行分布情况进行划分. 然后, 根据不同种子簇群的路径覆盖模式与聚类分析结果对种子进行优先级评估, 探索稀有代码区域并优先调度评估得分较高的种子. 其次, 通过种子评估得分为种子分配能量, 将突变得到的有趣输入保留并进行归类以更新种子簇群信息. Cluzz根据更新后的种子簇群重新评估种子, 以确保测试过程中种子的有效性, 从而在有限时间内探索更多的未知代码区域, 提高被测程序的覆盖率. 最后, 将Cluzz实现在3个当前主流的模糊器上, 并在8个流行的真实程序上进行大量测试工作. 结果表明: Cluzz检测独特崩溃的平均数量是普通模糊器的1.7倍, 在发现新边缘数量方面, 平均优于基准模糊器22.15%. 此外, 通过与现有种子调度方法进行对比, Cluzz的综合表现要优于其他基准模糊器.

    Abstract:

    As a widely used automated software testing technique, the primary goal of fuzzy testing is to explore as many code areas of the program under test as possible, thereby achieving higher coverage as well as detecting more bugs or errors. Most of existing fuzzy testing methods schedule the seed based on the historical mutation data of the seed, which is simpler to implement but ignores the distribution of program space explored by the seed, resulting in that the testing may fall into only a single region of the program to be probed, and causing the waste of testing resources. This study proposes the Cluzz, a fuzzing approach of clustering analysis-driven in seed scheduling. Firstly, Cluzz analyzes the difference between seeds in the feature space by combining the distribution of seed execution path coverage, and uses cluster analysis to classify the distribution of seeds execution in the program space. And then, Cluzz prioritizes the seeds according to the path coverage patterns of different seed clusters and the results of cluster analysis, explores the rare code regions and prioritizes the seeds with higher evaluation scores. Secondly, energy is allocated to the seeds by their evaluation scores, and the interesting inputs obtained from mutations are retained and categorized to update the seed cluster information. Cluzz reevaluates the seeds based on the updated seed clusters to ensure the validity of seeds during testing process, thereby exploring more unknown code regions in a limited time and improving the coverage of the program under test. Finally, the Cluzz is implemented on three current mainstream fuzzers and extensive testing work is conducted on eight popular real-world programs. The results show that Cluzz can detect an average of 1.7 times more unique crashes than a regular fuzzer, and it also outperforms a benchmark fuzzer by an average of 22.15% in terms of the number of new edges found. In addition, compared with the existing seed scheduling methods, the comprehensive performance of Cluzz is better than that of other benchmark fuzzers.

    参考文献
    [1] Aschermann C, Schumilo S, Blazytko T, Gawlik R, Holz T. REDQUEEN: Fuzzing with input-to-state correspondence. In: Proc. of the Network and Distributed System Security Symp. (NDSS). 2019. [doi: 10.14722/ndss.2019.23xxx]
    [2] Zheng Y, Davanian A, Yin H, Song C, Zhu H, Sun L. FIRM-AFL: High-throughput greybox fuzzing of iot firmware via augmented process emulation. In: Proc. of the 28th USENIX Security Symp. (USENIX Security 2019). 2019. 1099-1114.
    [3] Schumilo S, Aschermann C, Gawlik R, Schinzel S, Holz T. kAFL: Hardware-assisted Feedback Fuzzing for OS Kernels. In: Proc. of the 26th USENIX Security Symp. (USENIX Security 17). 2017. 167-182.
    [4] Fioraldi A, Maier D, Eißfeldt H, Heuse M. AFL++: Combining incremental steps of fuzzing research. In: Proc. of the 14th USENIX Workshop on Offensive Technologies. (WOOT 2020). 2020. 1-12.
    [5] Zalewski M. American Fuzzy Lop (AFL). 2021. http://lcamtuf.coredump.cx/afl/
    [6] Chen P, Chen H. Angora: Efficient fuzzing by principled search. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2018. 711-725.
    [7] Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H. VUzzer: Application-aware evolutionary fuzzing. In: Proc. of the Network and Distributed System Security Symp. (NDSS). 2017. [doi: 10.14722/ndss.2017.23404]
    [8] libFuzzer—A library for coverage-guided fuzz testing. 2021. https://llvm.org/docs/LibFuzzer.html
    [9] Zhao L, Duan Y, Yin H, Xuan J. Send hardest problems my way: Probabilistic path prioritization for hybrid fuzzing. In: Proc. of the Network and Distributed System Security Symp. (NDSS). 2019. [doi: 10.14722/ndss.2019.23504]
    [10] Herrera A, Gunadi H, Magrath S, Norrish M, Payer M, Hosking AL. Seed selection for successful fuzzing. In: Proc. of the 30th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis (ISSTA). ACM, 2021. 230-243.
    [11] Godefroid P, Peleg H, Singh R. Learn&fuzz: Machine learning for input fuzzing. In: Proc. of the 32nd IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). IEEE, 2017. 50-59.
    [12] Zong P, Lv T, Wang D, Deng Z, Liang R, Chen K. FuzzGuard: Filtering out unreachable inputs in directed grey-box fuzzing through deep learning. In: Proc. of the 29th USENIX Security Symp. (USENIX Security 2020). 2020. 2255-2269.
    [13] She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ: Efficient fuzzing with neural program smoothing. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2019. 803-817.
    [14] She D, Krishna R, Yan L, Jana S, Ray B. MTFuzz: Fuzzing with a multi-task neural network. In: Proc. of the 28th ACM Joint Meeting. on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering (ESEC/FSE). ACM, 2020. 737-749.
    [15] 高凤娟, 王豫, 司徒凌云, 王林章. 基于深度学习的混合模糊测试方法. 软件学报, 2021, 32(4): 988-1005. http://www.jos.org.cn/1000-9825/6225.htm [doi: 10.13328/j.cnki.jos.006225]
    Gao FJ, Wang Y, Situ LY, Wang LZ. Deep learning-based hybrid fuzz testing. Ruan Jian Xue Bao/Journal of Software, 2021, 32(4): 988-1005(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6225.htm [doi: 10.13328/j.cnki.jos.006225]
    [16] Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R. MOPT: Optimized mutation scheduling for fuzzers. In: Proc. of the 28th USENIX Security Symp. (USENIX Security 2019). 2019. 1949-1966.
    [17] Jauernig P, Jakobovic D, Picek S, Stapf E, Sadeghi AR. DARWIN: Survival of the fittest fuzzing mutators. In: Proc. of the Network and Distributed System Security Symp. (NDSS). 2023. [doi: 10.14722/ndss.2023.23159]
    [18] Peng H, Shoshitaishvili Y, Payer M. T-Fuzz: Fuzzing by program transformation. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2018. 697-710.
    [19] Yue T, Wang P, Tang Y, Wang E, Yu B, Lu K, Zhou X. EcoFuzz: Adaptive energy-saving greybox fuzzing as a variant of the adversarial multi-armed bandit. In: Proc. of the 29th USENIX Security Symp. (USENIX Security 2020). 2020. 2307-2324.
    [20] Lemieux C, Sen K. FairFuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc. of the 33rd ACM/ IEEE Int’l Conf. on Automated Software Engineering (ASE). ACM, 2018. 475-485.
    [21] She D, Shah A, Jana S. Effective seed scheduling for fuzzing with graph centrality analysis. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2022. 2194-2211.
    [22] Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). 2007. 1027-1035.
    [23] Hartigan JA, Wong MA. Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979, 28(1): 100-108. [doi: 10.2307/2346830]
    [24] Ester M, Kriegel HP, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining (KDD’96). 1996. 226-231.
    [25] Ward Jr JH. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, 58(301): 236-244.
    [26] Lyu C, Liang H, Ji S, Zhang X, Zhao B, Han M, Li Y, Wang Z, Wang W, Beyah R. SLIME: Program-sensitive energy allocation for fuzzing. In: Proc. of the 31st ACM SIGSOFT Int’l Symp. on Software Testing and Analysis (ISSTA). ACM, 2022. 365-377.
    [27] Böhme M, Manès VJM, Cha SK. Boosting fuzzer efficiency: An information theoretic perspective. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering (ESEC/FSE). ACM, 2020. 678-689.
    [28] Chen J, Wang S, Cai S, Zhang C, Chen H, Chen J, Zhang J. A novel coverage-guided greybox fuzzing based on power schedule optimization with time complexity. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). ACM, 2022. 1-5.
    [29] Böhme M, Pham VT, Roychoudhury A. Coverage-based greybox fuzzing as Markov chain. In: Proc. of the 2016 ACM SIGSAC Conf. on Computer and Communications Security (CCS). ACM, 2016. 1032-1043.
    [30] Zhang K, Xiao X, Zhu X, Sun R, Xue M, Wen S. Path transitions tell more: Optimizing fuzzing schedules via runtime program states. In: Proc. of the 44th Int’l Conf. on Software Engineering (ICSE). ACM, 2022. 1658-1668.
    [31] Yun I, Lee S, Xu M, Jang Y, Kim T. QSYM: A practical concolic execution engine tailored for hybrid fuzzing. In: Proc. of the 27th USENIX Security Symp. (USENIX Security 2018). 2018. 745-761.
    [32] 谢肖飞, 李晓红, 陈翔, 孟国柱, 刘杨. 基于符号执行与模糊测试的混合测试方法. 软件学报, 2019, 30(10): 3071-3089. http:// www.jos.org.cn/1000-9825/5789.htm [doi: 10.13328/j.cnki.jos.005789]
    Xie XF, Li XH, Chen X, Meng GZ, Liu Y. Hybrid testing based on symbolic execution and fuzzing. Ruan Jian Xue Bao/Journal of Software, 2019, 30(10): 3071-3089(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5789.htm [doi: 10.13328/j. cnki.jos.005789]
    [33] Gan S, Zhang C, Chen P, Zhao B, Qin X, Wu D, Chen Z. GREYONE: Data flow sensitive fuzzing. In: Proc. of the 29th USENIX Security Symp. (USENIX Security 2020). 2020. 2577-2594.
    [34] Liang J, Wang M, Zhou C, Wu Z, Jiang Y, Liu J, Liu Z, Sun J. PATA: Fuzzing with path aware taint analysis. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2022. 1-17.
    [35] Wang T, Wei T, Gu G, Zou W. TaintScope: A checksum-aware directed fuzzing tool for automatic software vulnerability detection. In: Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2010. 497-512.
    [36] Ganesh V, Leek T, Rinard M. Taint-based directed whitebox fuzzing. In: Proc. of the IEEE 31st Int’l Conf. on Software Engineering (ICSE). IEEE, 2009. 474-484.
    [37] Lemieux C, Padhye R, Sen K, Song D. PerfFuzz: Automatically generating pathological inputs. In: Proc. of the 27th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis (ISSTA). ACM, 2018. 254-265.
    [38] Petsios T, Zhao J, Keromytis AD, Jana S. SlowFuzz: Automated domain-independent detection of algorithmic complexity vulnerabilities. In: Proc. of the 2017 ACM SIGSAC Conf. on Computer and Communications Security (CCS). ACM, 2017. 2155-2168.
    [39] Coppik N, Schwahn O, Suri N. MemFuzz: Using memory accesses to guide fuzzing. In: Proc. of the 12th IEEE Conf. on Software Testing, Validation and Verification (ICST). IEEE, 2019. 48-58.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张文,陈锦富,蔡赛华,张翅,刘一松.一种聚类分析驱动种子调度的模糊测试方法.软件学报,2024,35(7):3141-3161

复制
分享
文章指标
  • 点击次数:694
  • 下载次数: 2699
  • HTML阅读次数: 843
  • 引用次数: 0
历史
  • 收稿日期:2023-09-09
  • 最后修改日期:2023-10-30
  • 在线发布日期: 2024-01-05
  • 出版日期: 2024-07-06
文章二维码
您是第19727649位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号