基于深度学习的混合模糊测试方法
作者:
作者简介:

高凤娟(1991-),女,学士,主要研究领域为软件工程,程序分析,软件测试,软件安全.
王豫(1991-),男,学士,主要研究领域为软件工程,程序分析,软件测试,软件安全.
司徒凌云(1988-),男,博士,CCF专业会员,主要研究领域为软件工程,信息安全,静态分析,模糊测试.
王林章(1973-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为软件工程,软件测试,软件安全.

通讯作者:

王林章,E-mail:lzwang@nju.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(62032010);江苏省研究生科研与实践创新计划


Deep Learning-based Hybrid Fuzz Testing
Author:
Fund Project:

National Natural Science Foundation of China (62032010); Postgraduate Research & Practice Innovation Program of Jiangsu Province

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [66]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着软件技术的快速发展,面向领域的软件系统在广泛使用的同时带来了研究与应用上的新挑战.由于领域应用对安全性、可靠性有着很高的要求,而符号执行和模糊测试等技术在保障软件系统的安全性、可靠性方面已经发展了数十年,许多研究和被发现的缺陷表明了它们的有效性.但是,由于两者的优劣各有不同,将这两者相结合仍是近期热门研究的话题.目前的结合方法在于两者相互协助,例如,模糊测试不可达的区域交给符号执行求解.但是,这些方法只能在模糊测试(或符号执行)运行时判定是否应该借助符号执行(或模糊测试),无法同时利用这两者的优势,从而导致性能不足.基于此,提出基于深度学习,将基于符号执行的测试与模糊测试相结合的混合测试方法.该方法旨在测试开始之前就判断适合模糊测试(或符号执行)的路径集,从而制导模糊测试(或符号执行)到达适合它们的区域.同时,还提出混合机制实现两者之间的交互,从而进一步提升整体的覆盖率.基于 LAVA-M中程序的实验结果表明,所提方法相对于单独符号执行或模糊测试,能够提升 20%+的分支覆盖率,增加约 1~13倍的路径数目,多检测出 929个缺陷.

    Abstract:

    With the rapid development of software techniques, domain-driven software raises new challenges in software security and robustness. Symbolic execution and fuzzing have been rapidly developed in recent decades, demonstrating their ability in detecting software bugs. Enormous detected and fixed bugs demonstrate their feasibility. However, it is still a challenging task to combine the two methods due to their corresponding weakness. State-of-the-art techniques focus on incorporating the two methods such as using symbolic execution to solve paths when fuzzing gets stuck in complex paths. Unfortunately, such methods are inefficient because they have to switch to fuzzing (resp. symbolic execution) when conducting symbolic execution (resp. fuzzing). This paper presents a new deep learning-based hybrid testing method using symbolic execution and fuzzing. This method tries to predict paths that are suitable for fuzzing (resp. symbolic execution) and guide the fuzzing (resp. symbolic execution) to reach the paths. To further enhance the effectiveness, a hybrid mechanism is proposed to make them interact with each other. The proposed approach is evaluated on the programs in LAVA-M, and the results are compared with that using symbolic execution or fuzzing independently. The proposed method achieves more than 20% increase of branch coverage, 1 to 13 times increase of the path number, and uncover 929 more bugs.

    参考文献
    [1] Stephens N, Grosen J, Salls C, et al. Driller:Augmenting fuzzing through selective symbolic execution. In:Proc. of the Network and Distributed System Security Symp. (NDSS). 2016.
    [2] Xie X, Li X, Chen X, Meng GZ, Liu C. Branch coverage-guided hybrid testing based on symbolic execution and fuzzing. Ruan Jian Xue Bao/Journal of Software, 2019,30(10):3071-3089(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5789.htm[doi:10.13328/j.cnki.jos.005789]
    [3] Zhao L, Duan Y, Yin H, et al. Send hardest problems my way:Probabilistic path prioritization for hybrid fuzzing. In:Proc. of the Network and Distributed System Security Symp. (NDSS). 2019.
    [4] Pak BS. Hybrid fuzz testing:Discovering software bugs via fuzzing and symbolic execution[Ph.D. Thesis]. School of Computer Science, Carnegie Mellon University, 2012.
    [5] Yun I, Lee S, Xu M, Jang Y, Kim T. {QSYM}:A practical concolic execution engine tailored for hybrid fuzzing. In:Proc. of the 27th {USENIX} Security Symp. ({USENIX} Security 18). 2018. 745-761.
    [6] Cha SK, Woo M, Brumley D. Program-adaptive mutational fuzzing. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2015. 725-741.
    [7] Ognawala S, Hutzelmann T, Psallida E, Pretschner A. Improving function coverage with munch:A hybrid fuzzing and directed symbolic execution approach. In:Proc. of the 33rd Annual ACM Symp. on Applied Computing. 2018. 1475-1482.
    [8] King JC. Symbolic execution and program testing. Communications of the ACM, 1976,19(7):385-94.
    [9] Cadar C, Dunbar D, Engler DR. KLEE:Unassisted and automatic generation of high-coverage tests for complex systems programs. In:Proc. of the USENIX Symp. on Operating Systems Design and Implementations (OSDI). 2008. 209-224.
    [10] Miller BP, Fredriksen L, So B. An empirical study of the reliability of UNIX utilities. Communications of the ACM, 1990,33(12):32-44.
    [11] McNally R, Yiu K, Grove D, Gerhardy D. Fuzzing:The state of the art. Defence Science and Technology Organisation Edinburgh (Australia), 2012. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=15CF9A7FD272D62D76FB5ED26DA3808F?doi=10.1.1.461.4627&rep=rep1&type=pdf
    [12] American fuzzy lop. 2020. http://lcamtuf.coredump.cx/afl/
    [13] Si XJ, Dai HJ, Raghothaman M, Naik M, Song L. Learning loop invariants for program verification. In:Advances in Neural Information Processing Systems (NeurIPS). 2018. 7751-7762.
    [14] Li YJ, Tarlow D, Brockschmidt M, Zemel R. Gated graph sequence neural networks. arXiv Preprint arXiv:1511.05493. 2015.
    [15] Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation:Encoder-decoder approaches. arXiv Preprint arXiv:1409.1259. 2014.
    [16] Ferrante J, Ottenstein KJ, Warren JD. The program dependence graph and its use in optimization. ACM Trans. on Programming Languages and Systems (TOPLAS), 1987, 319-349.
    [17] Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, Ulrich F, Whelan R. Lava:Large-scale automated vulnerability addition. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE 2016. 110-121.
    [18] AFL Cov. 2020. http://cipherdyne.com/afl-cov/
    [19] Huang H, Yao P, Wu R, Shi Q, Zhang C. PANGOLIN:Incremental hybrid fuzzing with polyhedral path abstraction. In:Proc. of the IEEE Symp. on Security and Privacy (SP). 2020. 1613-1627.
    [20] Shellphuzz. 2020. https://github.com/shellphish/fuzzer
    [21] Böhme M, Pham VT, Roychoudhury A. Coverage-based greybox fuzzing as markov chain. IEEE Trans. on Software Engineering, 2017,45(5):489-506.
    [22] Chen P, Chen H. Angora:Efficient fuzzing by principled search. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2018. 711-725.
    [23] Johansson W, Svensson M, Larson UE, Almgren M, Gulisano V. T-Fuzz:Model-based fuzzing for robustness testing of telecommunication protocols. In:Proc. of the 17th IEEE Int'l Conf. on Software Testing, Verification and Validation. IEEE, 2014. 323-332.
    [24] Baldoni R, Coppa E, D'elia DC, Demetrescu C, Finocchi I. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR), 2018,51(3):1-39.
    [25] Avgerinos T, Cha SK, Rebert A, Schwartz EJ, Woo M, Brumley D. Automatic exploit generation. Communications of the ACM, 2014,57(2):74-84.
    [26] Chipounov V, Kuznetsov V, Candea G. The S2E platform:Design, implementation, and applications. ACM Trans. on Computer Systems (TOCS), 2012,30(1):1-49.
    [27] Li Y, Su Z, Wang L, Li X. Steering symbolic execution to less traveled paths. ACM SIGPLAN Notices, 2013,48(10):19-32.
    [28] Ma KK, Phang KY, Foster JS, Hicks M. Directed symbolic execution. In:Proc. of the Int'l Static Analysis Symp. Springer-Verlag, 2011. 95-111.
    [29] Zhang Y, Chen Z, Wang J, Dong W, Liu Z. Regular property guided dynamic symbolic execution. In:Proc. of the 37th IEEE/ACM Int'l Conf. on Software Engineering (ICSE). IEEE, 2015. 643-653.
    [30] Xie T, Tillmann N, De Halleux J, Schulte W. Fitness-guided path exploration in dynamic symbolic execution. In:Proc. of the IEEE/IFIP Int'l Conf. on Dependable Systems & Networks (DSN). IEEE, 2009. 359-368.
    [31] Godefroid P. Compositional dynamic test generation. In:Proc. of the 34th Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL). 2007. 47-54.
    [32] Anand S, Godefroid P, Tillmann N. Demand-driven compositional symbolic execution. In:Proc. of the Int'l Conf. on Tools and Algorithms for the Construction and Analysis of Systems. Springer-Verlag, 2008. 367-381.
    [33] Boonstoppel P, Cadar C, Engler D. RWset:Attacking path explosion in constraint-based test generation. In:Proc. of the Int'l Conf. on Tools and Algorithms for the Construction and Analysis of Systems. Springer-Verlag, 2008. 351-366.
    [34] Godefroid P, Luchaup D. Automatic partial loop summarization in dynamic test generation. In:Proc. of the Int'l Symp. on Software Testing and Analysis (ISSTA). 2011. 23-33.
    [35] Xie X, Chen B, Liu Y, Le W, Li X. Proteus:Computing disjunctive loop summary via path dependency analysis. In:Proc. of the 24th ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering (FSE). 2016. 61-72.
    [36] Hansen T, Schachte P, Søndergaard H. State joining and splitting for the symbolic execution of binaries. In:Proc. of the Int'l Workshop on Runtime Verification. Springer-Verlag, 2009. 76-92.
    [37] Kuznetsov V, Kinder J, Bucur S, Candea G. Efficient state merging in symbolic execution. ACM SIGPLAN Notices, 2012,47(6):193-204.
    [38] Shoshitaishvili Y, Wang R, Hauser C, Kruegel C, Vigna G. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In:Proc. of the Network and Distributed System Security Symp. (NDSS). 2015.
    [39] Cha SK, Avgerinos T, Rebert A, Brumley D. Unleashing mayhem on binary code. In:Proc. of the IEEE Symp. on Security and Privacy. IEEE, 2012. 380-394.
    [40] Khoo YP, Chang BY, Foster JS. Mixing type checking and symbolic execution. ACM SIGPLAN Notices, 2010,45(6):436-47.
    [41] Gao F, Wang L, Li X. BovInspector:Automatic inspection and repair of buffer overflow vulnerabilities. In:Proc. of the 31st IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). IEEE, 2016. 786-791.
    [42] Wang G, Chattopadhyay S, Biswas AK, Mitra T, Roychoudhury A. Kleespectre:Detecting information leakage through speculative cache attacks via symbolic execution. ACM Trans. on Software Engineering and Methodology (TOSEM), 2020,29(3):1-31.
    [43] De Moura L, Bjørner N. Z3:An efficient SMT solver. In:Proc. of the Int'l Conf. on Tools and Algorithms for the Construction and Analysis of Systems. Springer-Verlag, 2008. 337-340.
    [44] Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna G. SoK:(state of) the art of war:Offensive techniques in binary analysis. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2016. 138-157.
    [45] Cadar C, Ganesh V, Pawlowski PM, Dill DL, Engler DR. EXE:Automatically generating inputs of death. In:Proc. of the 13th ACM Conf. on Computer and Communications Security (CCS). 2006.
    [46] Ganesh V, Dill DL. A decision procedure for bit-vectors and arrays. In:Proc. of the Int'l Conf. on Computer Aided Verification. Springer-Verlag, 2007. 519-531.
    [47] Yang G, Păsăreanu CS, Khurshid S. Memoized symbolic execution. In:Proc. of the Int'l Symp. on Software Testing and Analysis (ISSTA). 2012. 144-154.
    [48] Jia X, Ghezzi C, Ying S. Enhancing reuse of constraint solutions to improve symbolic execution. In:Proc. of the Int'l Symp. on Software Testing and Analysis (ISSTA). 2015. 177-187.
    [49] Liang H, Pei X, Jia X, Shen W, Zhang J. Fuzzing:State of the art. IEEE Trans. on Reliability, 2018,67(3):1199-218.
    [50] Kim SY, Cha S, Bae DH. Automatic and lightweight grammar generation for fuzz testing. Computers & Security, 2013,1(36):1.
    [51] Godefroid P, Levin MY, Molnar DA. Automated whitebox fuzz testing. In:Proc. of the Network and Distributed System Security Symp. (NDSS). 2008. 151-166.
    [52] Godefroid P, Levin MY, Molnar D. SAGE:Whitebox fuzzing for security testing. Queue, 2012,10(1):20-27.
    [53] Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K. Pin:Building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Notices, 2005,40(6):190-200.
    [54] Nethercote N, Seward J. Valgrind:A framework for heavyweight dynamic binary instrumentation. ACM SIGPLAN Notices, 2007, 42(6):89-100.
    [55] Bekrar S, Bekrar C, Groz R, Mounier L. Finding software vulnerabilities by smart fuzzing. In:Proc. of the 4th IEEE Int'l Conf. on Software Testing, Verification and Validation. IEEE, 2011. 427-430.
    [56] Fayaz SK, Yu T, Tobioka Y, Chaki S, Sekar V. {BUZZ}:Testing context-dependent policies in stateful networks. In:Proc. of the 13th {USENIX} Symp. on Networked Systems Design and Implementation ({NSDI} 16). 2016. 275-289.
    [57] Böhme M, Pham VT, Nguyen MD, Roychoudhury A. Directed greybox fuzzing. In:Proc. of the ACM SIGSAC Conf. on Computer and Communications Security (CCS). 2017. 2329-2344.
    [58] Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H. VUzzer:Application-aware evolutionary fuzzing. In:Proc. of the Network and Distributed System Security Symp. (NDSS). 2017. 1-14.
    [59] Li Y, Chen B, Chandramohan M, Lin SW, Liu Y, Tiu A. Steelix:Program-state based binary fuzzing. In:Proc. of the 11th Joint Meeting on Foundations of Software Engineering. 2017. 627-637.
    [60] She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ:Efficient fuzzing with neural program smoothing. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2019. 803-817.
    [61] Wang J, Chen B, Wei L, Liu Y. Skyfire:Data-driven seed generation for fuzzing. In:Proc. of the IEEE Symp. on Security and Privacy (SP). IEEE, 2017. 579-594.
    [62] Wang W, Sun H, Zeng Q. SeededFuzz:Selecting and generating seeds for directed fuzzing. In:Proc. of the Int'l Symp. on Theoretical Aspects of Software Engineering. IEEE, 2016.
    [63] Liang H, Jiang L, Ai L, Wei J. Sequence directed hybrid fuzzing. In:Proc. of the 27th IEEE Int'l Conf. on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020. 127-137.
    [64] He J, Balunović M, Ambroladze N, Tsankov P, Vechev M. Learning to fuzz from symbolic execution with application to smart contracts. In:Proc. of the ACM SIGSAC Conf. on Computer and Communications Security (CCS). 2019. 531-548.
    附中文参考文献:
    [2] 谢肖飞,李晓红,陈翔,孟国柱,刘杨.基于符号执行与模糊测试的混合测试方法.软件学报,2019,30(10):3071-3089. http://www.jos.org.cn/1000-9825/5789.htm[doi:10.13328/j.cnki.jos.005789]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

高凤娟,王豫,司徒凌云,王林章.基于深度学习的混合模糊测试方法.软件学报,2021,32(4):988-1005

复制
分享
文章指标
  • 点击次数:2822
  • 下载次数: 7606
  • HTML阅读次数: 3590
  • 引用次数: 0
历史
  • 收稿日期:2020-09-13
  • 最后修改日期:2020-10-26
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-04-06
文章二维码
您是第19732805位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号