Compiler Fuzzing Test Case Generation with Feed-forward Neural Network

doi:10.13328/j.cnki.jos.006565

微信服务号

微信订阅号

2025-4-5- 11

Home > Archive>Volume 33, Issue 6, 2022 >1996-2011. DOI:10.13328/j.cnki.jos.006565

PDF HTML XML Export Cite reminder

Compiler Fuzzing Test Case Generation with Feed-forward Neural Network
DOI:
                        10.13328/j.cnki.jos.006565
                    
Author:
                        XU Hao-RanXU Hao-Ran
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Yong-JunWANG Yong-Jun
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG Zhi-JianHUANG Zhi-Jian
Institute of System Engineering, Academy of Military Sciences, Beijing 100097, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIE Pei-DaiXIE Pei-Dai
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FAN Shu-HuiFAN Shu-Hui
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference [35]

Cited by

Materials

Comments

Abstract:

Compiler fuzzing is one of the commonly used techniques to test the functionality and safety of compilers. The fuzzer produces grammatically correct test cases to test the deep parts of the compiler. Recently, recurrent neural networks-based deep learning methods have been introduced to the test case generation process. Aiming at the problems of insufficient grammatical accuracy and low generation efficiency when generating test cases, a method for generating compiler fuzzing test cases is proposed based on feed-forward neural networks, and the prototype tool FAIR is designed and implemented. Different from the method based on token sequence learning, FAIR extracts code fragments from the abstract syntax tree, and uses a self-attention-based feed-forward neural network to capture the grammatical associations between code fragments. After learning a generative model of the programming language, fair automatically produce diverse test cases. Experimental results show that FAIR is superior to its competitors in terms of grammatical accuracy and generation efficiency of generating test cases. The proposed method has significantly improved the ability to detect compiler software defects, and has successfully detected 20 software defects in GCC and LLVM. In addition, the method has soundportability. The simple ported FAIR-JS has detected 2 defects in the JavaScript engine.

Key words:software defect;compiler fuzzing;deep learning;feed-forward neural network;abstract syntax network

Reference

[1] Chen JJ, Patra J, Pradel M, Xiong YF, Zhang HY, Hao D, Zhang L. A survey of compiler testing. ACM Computing Surveys (CSUR), 2020, 53(1):1-36.

[2] Cummins C, Petoumenos P, Murray A, Leather, H. Compiler fuzzing through deep learning. In:Proc. of the 27th ACM SIGSOFT Int'l Symp. on Software Testing and Analysis. 2018. 95-105.

[3] Yang XJ, Chen Y, Eide E, Regehr J. Finding and understanding bugs in C compilers. In:Proc. of the 32nd ACM SIGPLAN Conf. on Programming Language Design and Implementation. 2011. 283-294.

[4] Kolen JF, Kremer SC. Gradient flow in recurrent nets:The difficulty of learning LongTerm dependencies. In:A Field Guide to Dynamical Recurrent Networks. 2001. 237-243.[doi:10.1109/9780470544037.ch14]

[5] Liu X, Li X, Prajapati R, Wu D. Deepfuzz:Automatic generation of syntax valid C programs for fuzz testing. In:Proc. of the AAAI Conf. on Artificial Intelligence. 2019, 33(1):1044-1051.

[6] Le V, Afshari M, Su ZD. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Notices, 2014, 49(6):216-226.

[7] Le V, Sun CN, Su ZD. Finding deep compiler bugs via guided stochastic program mutation. ACM SIGPLAN Notices, 2015, 50(10):386-399.

[8] Sun CN, Le V, Su ZD. Finding compiler bugs via live code mutation. In:Proc. of the 2016 ACM SIGPLAN Int'l Conf. on Object-oriented Programming, Systems, Languages, and Applications. 2016. 849-863.

[9] Chen P, Chen H. Angora:Efficient fuzzing by principled search. In:Proc. of the 2018 IEEE Symp. on Security and Privacy (SP). IEEE, 2018. 711-725.

[10] Holler C, Herzig K, Zeller A. Fuzzing with code fragments. In:Proc. of the 21st{USENIX}Security Symp.({USENIX}Security 2012). 2012. 445-458.

[11] Wang NY, Ye YX, Liu L, Feng LZ, Bao T, Peng T. Language models based on deep learning:A review. Ruan Jian Xue Bao/Journal of Software, 2021, 32(4):1082-1115(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6169.htm[doi:10.13328/j.cnki.jos.006169]

[12] Sutskever I, Martens J, Hinton GE. Generating text with recurrent neural networks. In:Proc. of the ICML. 2011.

[13] Godefroid P, Peleg H, Singh R. Learn&fuzz:Machine learning for input fuzzing. In:Proc. of the 32nd IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). IEEE, 2017. 50-59.

[14] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.

[15] Karpathy A, Johnson J, Fei-Fei L. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, 2015.

[16] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409. 0473, 2014.

[17] Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508. 04025, 2015.

[18] Salton G, Ross R, Kelleher J. Attentive language models. In:Proc. of the 8th Int'l Joint Conf. on Natural Language Processing (Vol.1:Long Papers). 2017. 441-450.

[19] Al-Rfou R, Choe D, Constant N, Guo M, Jones L. Character-level language modeling with deeper self-attention. In:Proc. of the AAAI Conf. on Artificial Intelligence. 2019, 33(1):3159-3166.

[20] Zheng W, Chen JZ, Wu XX, Chen X, Xia X. Empirical studies on deep-learning-based security bug report prediction methods. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5):1294-1313(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5954.htm[doi:10.13328/j.cnki.jos.005954]

[21] Rush AM, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685, 2015.

[22] Paulus R, Xiong C, Socher R. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304, 2017.

[23] Hu X, Li G, Liu F, Jin Z. Program generation and code completion techniques based on deep learning:Literature review. Ruan Jian Xue Bao/Journal of Software, 2019, 30(5):1206-1223(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5717. htm[doi:10.13328/j.cnki.jos.005717]

[24] Alon U, Zilberstein M, Levy O, Yahav E. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 2018, 53(4):404-419.

[25] Alon U, Sadaka R, Levy O, Yahav E. Structural language models of code. In:Proc. of the Int'l Conf. on Machine Learning. PMLR, 2020. 245-256.

[26] Chen ZX, Zou DQ, Li Z, Jin H. Intelligent vulnerability detection system based on abstract syntax tree. Journal of Cyber Security, 2020, 5(4):1-13(in Chinese with English abstract).

[27] Wang XM, Zhang T, Xin W, Hou CY. Source code defect detection based on deep learning. Journal of Beijing Institute of Technology (Natural Science Edition), 2019, 39(11):1155-1159(in Chinese with English abstract).

[28] Veggalam S, Rawat S, Haller I, Bos H. Ifuzzer:An evolutionary interpreter fuzzer using genetic programming. In:Proc. of the European Symp. on Research in Computer Security. Cham:Springer, 2016. 581-601.

[29] Han HS, Oh DH, Cha SK. CodeAlchemist:Semantics-aware code generation to find vulnerabilities in JavaScript engines. In:Proc. of the NDSS. 2019.

[30] Lee S, Han HS, Cha SK, Son S. Montage:A neural network language model-guided javascript engine fuzzer. In:Proc. of the 29th{USENIX}Security Symp.({USENIX}Security 2020). 2020. 2613-2630.

附中文参考文献:
[11] 王乃钰,叶育鑫,刘露,凤丽洲,包铁,彭涛.基于深度学习的语言模型研究进展.软件学报, 2021, 32(4):1082-1115. http://www.jos.org.cn/1000-9825/6169.htm[doi:10.13328/j.cnki.jos.006169]

[20] 郑炜,陈军正,吴潇雪,陈翔,夏鑫.基于深度学习的安全缺陷报告预测方法实证研究.软件学报, 2020, 31(5):1294-1313. http://www.jos.org.cn/1000-9825/5954.htm[doi:10.13328/j.cnki.jos.005954]

[23] 胡星,李戈,刘芳,金芝.基于深度学习的程序生成与补全技术研究进展.软件学报, 2019, 30(5):1206-1223. http://www.jos. org.cn/1000-9825/5717.htm[doi:10.13328/j.cnki.jos.005717]

[26] 陈肇炫,邹德清,李珍,金海.基于抽象语法树的智能化漏洞检测系统.信息安全学报, 2020, 5(4):1-13.

[27] 王晓萌,张涛,辛伟,侯长玉.深度学习源代码缺陷检测方法.北京理工大学学报, 2019, 39(11):1155-1159.

Get Citation

徐浩然,王勇军,黄志坚,解培岱,范书珲.基于前馈神经网络的编译器测试用例生成方法.软件学报,2022,33(6):1996-2011

Copy

Article Metrics

Abstract:1932
PDF: 4577
HTML: 2989
Cited by: 0

History

Received:September 05,2021
Revised:October 15,2021
Adopted:
Online: January 28,2022
Published: June 06,2022

You are the first2033154Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History