Compiler Fuzzing Test Case Generation with Feed-forward Neural Network
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [35]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    Compiler fuzzing is one of the commonly used techniques to test the functionality and safety of compilers. The fuzzer produces grammatically correct test cases to test the deep parts of the compiler. Recently, recurrent neural networks-based deep learning methods have been introduced to the test case generation process. Aiming at the problems of insufficient grammatical accuracy and low generation efficiency when generating test cases, a method for generating compiler fuzzing test cases is proposed based on feed-forward neural networks, and the prototype tool FAIR is designed and implemented. Different from the method based on token sequence learning, FAIR extracts code fragments from the abstract syntax tree, and uses a self-attention-based feed-forward neural network to capture the grammatical associations between code fragments. After learning a generative model of the programming language, fair automatically produce diverse test cases. Experimental results show that FAIR is superior to its competitors in terms of grammatical accuracy and generation efficiency of generating test cases. The proposed method has significantly improved the ability to detect compiler software defects, and has successfully detected 20 software defects in GCC and LLVM. In addition, the method has soundportability. The simple ported FAIR-JS has detected 2 defects in the JavaScript engine.

    Reference
    [1] Chen JJ, Patra J, Pradel M, Xiong YF, Zhang HY, Hao D, Zhang L. A survey of compiler testing. ACM Computing Surveys (CSUR), 2020, 53(1):1-36.
    [2] Cummins C, Petoumenos P, Murray A, Leather, H. Compiler fuzzing through deep learning. In:Proc. of the 27th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. 2018. 95-105.
    [3] Yang XJ, Chen Y, Eide E, Regehr J. Finding and understanding bugs in C compilers. In:Proc. of the 32nd ACM SIGPLAN Conf. on Programming Language Design and Implementation. 2011. 283-294.
    [4] Kolen JF, Kremer SC. Gradient flow in recurrent nets:The difficulty of learning LongTerm dependencies. In:A Field Guide to Dynamical Recurrent Networks. 2001. 237-243.[doi:10.1109/9780470544037.ch14]
    [5] Liu X, Li X, Prajapati R, Wu D. Deepfuzz:Automatic generation of syntax valid C programs for fuzz testing. In:Proc. of the AAAI Conf. on Artificial Intelligence. 2019, 33(1):1044-1051.
    [6] Le V, Afshari M, Su ZD. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Notices, 2014, 49(6):216-226.
    [7] Le V, Sun CN, Su ZD. Finding deep compiler bugs via guided stochastic program mutation. ACM SIGPLAN Notices, 2015, 50(10):386-399.
    [8] Sun CN, Le V, Su ZD. Finding compiler bugs via live code mutation. In:Proc. of the 2016 ACM SIGPLAN Int'l Conf. on Object-oriented Programming, Systems, Languages, and Applications. 2016. 849-863.
    [9] Chen P, Chen H. Angora:Efficient fuzzing by principled search. In:Proc. of the 2018 IEEE Symp. on Security and Privacy (SP). IEEE, 2018. 711-725.
    [10] Holler C, Herzig K, Zeller A. Fuzzing with code fragments. In:Proc. of the 21st{USENIX}Security Symp.({USENIX}Security 2012). 2012. 445-458.
    [11] Wang NY, Ye YX, Liu L, Feng LZ, Bao T, Peng T. Language models based on deep learning:A review. Ruan Jian Xue Bao/Journal of Software, 2021, 32(4):1082-1115(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6169.htm[doi:10.13328/j.cnki.jos.006169]
    [12] Sutskever I, Martens J, Hinton GE. Generating text with recurrent neural networks. In:Proc. of the ICML. 2011.
    [13] Godefroid P, Peleg H, Singh R. Learn&fuzz:Machine learning for input fuzzing. In:Proc. of the 32nd IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). IEEE, 2017. 50-59.
    [14] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
    [15] Karpathy A, Johnson J, Fei-Fei L. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, 2015.
    [16] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409. 0473, 2014.
    [17] Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508. 04025, 2015.
    [18] Salton G, Ross R, Kelleher J. Attentive language models. In:Proc. of the 8th Int'l Joint Conf. on Natural Language Processing (Vol.1:Long Papers). 2017. 441-450.
    [19] Al-Rfou R, Choe D, Constant N, Guo M, Jones L. Character-level language modeling with deeper self-attention. In:Proc. of the AAAI Conf. on Artificial Intelligence. 2019, 33(1):3159-3166.
    [20] Zheng W, Chen JZ, Wu XX, Chen X, Xia X. Empirical studies on deep-learning-based security bug report prediction methods. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5):1294-1313(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5954.htm[doi:10.13328/j.cnki.jos.005954]
    [21] Rush AM, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685, 2015.
    [22] Paulus R, Xiong C, Socher R. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304, 2017.
    [23] Hu X, Li G, Liu F, Jin Z. Program generation and code completion techniques based on deep learning:Literature review. Ruan Jian Xue Bao/Journal of Software, 2019, 30(5):1206-1223(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5717. htm[doi:10.13328/j.cnki.jos.005717]
    [24] Alon U, Zilberstein M, Levy O, Yahav E. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 2018, 53(4):404-419.
    [25] Alon U, Sadaka R, Levy O, Yahav E. Structural language models of code. In:Proc. of the Int'l Conf. on Machine Learning. PMLR, 2020. 245-256.
    [26] Chen ZX, Zou DQ, Li Z, Jin H. Intelligent vulnerability detection system based on abstract syntax tree. Journal of Cyber Security, 2020, 5(4):1-13(in Chinese with English abstract).
    [27] Wang XM, Zhang T, Xin W, Hou CY. Source code defect detection based on deep learning. Journal of Beijing Institute of Technology (Natural Science Edition), 2019, 39(11):1155-1159(in Chinese with English abstract).
    [28] Veggalam S, Rawat S, Haller I, Bos H. Ifuzzer:An evolutionary interpreter fuzzer using genetic programming. In:Proc. of the European Symp. on Research in Computer Security. Cham:Springer, 2016. 581-601.
    [29] Han HS, Oh DH, Cha SK. CodeAlchemist:Semantics-aware code generation to find vulnerabilities in JavaScript engines. In:Proc. of the NDSS. 2019.
    [30] Lee S, Han HS, Cha SK, Son S. Montage:A neural network language model-guided javascript engine fuzzer. In:Proc. of the 29th{USENIX}Security Symp.({USENIX}Security 2020). 2020. 2613-2630.
    附中文参考文献:
    [11] 王乃钰,叶育鑫,刘露,凤丽洲,包铁,彭涛.基于深度学习的语言模型研究进展.软件学报, 2021, 32(4):1082-1115. http://www.jos.org.cn/1000-9825/6169.htm[doi:10.13328/j.cnki.jos.006169]
    [20] 郑炜,陈军正,吴潇雪,陈翔,夏鑫.基于深度学习的安全缺陷报告预测方法实证研究.软件学报, 2020, 31(5):1294-1313. http://www.jos.org.cn/1000-9825/5954.htm[doi:10.13328/j.cnki.jos.005954]
    [23] 胡星,李戈,刘芳,金芝.基于深度学习的程序生成与补全技术研究进展.软件学报, 2019, 30(5):1206-1223. http://www.jos. org.cn/1000-9825/5717.htm[doi:10.13328/j.cnki.jos.005717]
    [26] 陈肇炫,邹德清,李珍,金海.基于抽象语法树的智能化漏洞检测系统.信息安全学报, 2020, 5(4):1-13.
    [27] 王晓萌,张涛,辛伟,侯长玉.深度学习源代码缺陷检测方法.北京理工大学学报, 2019, 39(11):1155-1159.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

徐浩然,王勇军,黄志坚,解培岱,范书珲.基于前馈神经网络的编译器测试用例生成方法.软件学报,2022,33(6):1996-2011

Copy
Share
Article Metrics
  • Abstract:1932
  • PDF: 4577
  • HTML: 2989
  • Cited by: 0
History
  • Received:September 05,2021
  • Revised:October 15,2021
  • Online: January 28,2022
  • Published: June 06,2022
You are the first2033154Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063