干扰惰性序列的连续决策模型模糊测试
作者:
中图分类号:

TP311

基金项目:

国家自然科学基金(62232016, 62072442); 中国科学院青年创新促进会; 中国科学院软件研究所基础研究项目(ISCAS-JCZD-202304); 中国科学院软件研究所创新基金重大重点项目(ISCAS-ZD-202302); 中国科学院软件研究所2024年度“创新团队”(2024-66)


Fuzz Testing for Sequential Decision-making Model with Intervening Inert Sequences
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [48]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    人工智能技术的应用已经从分类、翻译、问答等相对静态的任务延伸到自动驾驶、机器人控制、博弈等需要和环境进行一系列“交互-行动”才能完成的相对动态的任务. 执行这类任务的模型核心是连续决策算法, 由于面临更高的环境和交互的不确定性, 而且这些任务往往是安全攸关的系统, 其测试技术面临极大的挑战. 现有的智能算法模型测试技术主要集中在单一模型的可靠性、复杂任务多样性测试场景生成、仿真测试等方向, 对连续决策模型的“交互-行动”决策序列没有关注, 导致无法适应, 或者成本效益低下. 提出一个干预惰性“交互-行动”决策序列执行的模糊测试方法IIFuzzing, 在模糊测试框架中, 通过学习“交互-行动”决策序列模式, 预测不会触发失效事故的惰性“交互-行动”决策序列, 并中止这类序列的测试执行, 以提高测试效能. 在4种常见的测试配置中进行实验评估, 结果表明, 与最新的针对连续决策模型的模糊测试相比, IIFuzzing可以在相同时间内多探测16.7%–54.5%的失效事故, 并且事故的多样性也优于基线方法.

    Abstract:

    The application of artificial intelligence technology has extended from relatively static tasks such as classification, translation, and question answering to relatively dynamic tasks that require a series of “interaction-action” with the environment to be completed, like autonomous driving, robotic control, and games. The core of the model for executing such tasks is the sequential decision-making (SDM) algorithm. As it faces higher uncertainties of the environment and interaction and these tasks are often safety-critical systems, the testing techniques are confronted with great challenges. The existing testing technologies for intelligent algorithm models mainly focus on the reliability of a single model, the generation of diverse test scenarios for complex tasks, simulation testing, etc., while no attention is paid to the “interaction-action” decision sequence of the SDM model, leading to unadaptability or low cost-effectiveness. In this study, a fuzz testing method named IIFuzzing for intervening in the execution of inert “interaction-action” decision sequences is proposed. In the fuzz testing framework, by learning the “interaction-action” decision sequence pattern, the inert “interaction-action” decision sequences that will not trigger failure accidents are predicted and the testing execution of such sequences is terminated to improve the testing efficiency. The experimental evaluations are conducted in four common test configurations, and the results show that compared with the latest fuzz testing for SDM models, IIFuzzing can detect 16.7%–54.5% more failure accidents within the same time, and the diversity of accidents is also better than that of the baseline approach.

    参考文献
    [1] Markov decision process. 2024. https://en.wikipedia.org/wiki/Markov_decision_process.
    [2] California DMV. Autonomous vehicle collision reports. 2024. https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/autonomous-vehicle-collision-reports/.
    [3] Guo JM, Jiang Y, Zhao Y, Chen Q, Sun JG. DLFuzz: Differential fuzzing testing of deep learning systems. In: Proc. of the 26th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Lake Buena Vista: ACM, 2018. 739–743. [doi: 10.1145/3236024.3264835]
    [4] Ma L, Zhang FY, Sun JY, Xue MH, Li B, Juefei-Xu F, Xie C, Li L, Liu Y, Zhao JJ, Wang YD. DeepMutation: Mutation testing of deep learning systems. In: Proc. of the 29th IEEE Int’l Symp. on Software Reliability Engineering. Memphis: IEEE, 2018. 100–111. [doi: 10.1109/ISSRE.2018.00021]
    [5] Chen TY, Cheung SC, Yiu SM. Metamorphic testing: A new approach for generating next test cases. arXiv:2002.12543, 2020.
    [6] Zhang MS, Zhang YQ, Zhang LM, Liu C, Khurshid S. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In: Proc. of the 33rd IEEE/ACM Int’l Conf. on Automated Software Engineering. Montpellier: ACM, 2018. 132–142. [doi: 10.1145/3238147.3238187]
    [7] Li GP, Li YR, Jha S, Tsai T, Sullivan M, Hari SKS, Kalbarczyk Z, Iyer RK. AV-FUZZER: Finding safety violations in autonomous driving systems. In: Proc. of the 31st IEEE Int’l Symp. on Software Reliability Engineering. Coimbra: IEEE, 2020. 25–36. [doi: 10.1109/ISSRE5003.2020.00012]
    [8] Tian HX, Jiang Y, Wu GQ, Yan JR, Wei J, Chen W, Li S, Ye D. MOSAT: Finding safety violations of autonomous driving systems using multi-objective genetic algorithm. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 94–106. [doi: 10.1145/3540250.3549100]
    [9] Pang Q, Yuan YY, Wang S. MDPFuzz: Testing models solving Markov decision processes. In: Proc. of the 31st ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2022. 378–390. [doi: 10.1145/3533767.3534388]
    [10] Sun ZY, Zhang JM, Harman M, Papadakis M, Zhang L. Automatic testing and improvement of machine translation. In: Proc. of the 42nd IEEE/ACM Int’l Conf. on Software Engineering. Seoul: IEEE, 2020. 974–985.
    [11] Sun ZY, Zhang JM, Xiong YF, Harman M, Papadakis M, Zhang L. Improving machine translation systems via isotopic replacement. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: IEEE, 2022. 1181–1192. [doi: 10.1145/3510003.3510206]
    [12] Chen SQ, Jin S, Xie XY. Testing your question answering software via asking recursively. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering. Melbourne: IEEE, 2021. 104–116. [doi: 10.1109/ASE51524.2021.9678670]
    [13] Shen QC, Chen JJ, Zhang JM, Wang HY, Liu S, Tian MH. Natural test generation for precise testing of question answering software. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2022. 71. [doi: 10.1145/3551349.3556953]
    [14] Liu ZX, Feng Y, Yin YN, Sun JY, Chen ZY, Xu BW. QATest: A uniform fuzzing framework for question answering systems. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2022. 81. [doi: 10.1145/3551349.3556929]
    [15] Tian YC, Pei KX, Jana S, Ray B. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In: Proc. of the 40th IEEE/ACM Int’l Conf. on Software Engineering. Gothenburg: IEEE, 2018. 303–314. [doi: 10.1145/3180155.3180220]
    [16] Zhou HS, Li W, Kong ZL, Guo JF, Zhang YQ, Yu B, Zhang L, Liu C. DeepBillboard: Systematic physical-world testing of autonomous driving systems. In: Proc. of the 42nd IEEE/ACM Int’l Conf. on Software Engineering. Seoul: IEEE, 2020. 347–358.
    [17] Gambi A, Mueller M, Fraser G. AsFault: Testing self-driving car software using search-based procedural content generation. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering: Companion Proc. Montreal: IEEE, 2019. 27–30. [doi: 10.1109/ICSE-Companion.2019.00030]
    [18] Abdessalem RB, Panichella A, Nejati S, Briand LC, Stifter T. Testing autonomous cars for feature interaction failures using many-objective search. In: Proc. of the 33rd IEEE/ACM Int’l Conf. on Automated Software Engineering. Montpellier: IEEE, 2018. 143–154. [doi: 10.1145/3238147.3238192]
    [19] Huang SH, Papernot N, Goodfellow IJ, Duan Y, Abbeel P. Adversarial attacks on neural network policies. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.
    [20] Lee XY, Ghadai S, Tan KL, Hegde C, Sarkar S. Spatiotemporally constrained action space attacks on deep reinforcement learning agents. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI, 2020. 4577–4584. [doi: 10.1609/aaai.v34i04.5887]
    [21] Gleave A, Dennis M, Wild C, Kant N, Levine S, Russell S. Adversarial policies: Attacking deep reinforcement learning. In: Proc. of the 8th Int’l Conf. on Learning Representations. Addis Ababa: OpenReview.net, 2020.
    [22] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. arXiv:1312.5602, 2013.
    [23] Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proc. of the 33rd Int’l Conf. on Machine Learning. New York City: JMLR, 2016. 1928–1937.
    [24] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
    [25] Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489.
    [26] Berner C, Brockman G, Chan B, et al. Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680, 2019.
    [27] Kober J, Bagnell JA, Peters J. Reinforcement learning in robotics: A survey. The Int’l Journal of Robotics Research, 2013, 32(11): 1238–1274.
    [28] Ho J, Ermon S. Generative adversarial imitation learning. In: Proc. of the 30th Int’l Conf. on Neural Information Processing Systems. Barcelona: Curran Associates Inc., 2016. 4572–4580.
    [29] Shin M, Kim J. Randomized adversarial imitation learning for autonomous driving. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: ijcai.org, 2019. 4590–4596. [doi: 10.24963/ijcai.2019/638]
    [30] Chen D, Zhou B, Koltun V, Kr?henbuhl P. Learning by cheating. In: Proc. of the 3rd Annual Conf. on Robot Learning. Osaka: PMLR, 2019. 66–75.
    [31] Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156–172.
    [32] Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350–354.
    [33] Wiering M. Multi-agent reinforcement leraning for traffic light control. In: Proc. of the 17th Int’l Conf. on Machine Learning. Stanford: Morgan Kaufmann Publishers Inc., 2000. 1151–1158.
    [34] Bohme M, Pham VT, Roychoudhury A. Coverage-based greybox fuzzing as Markov chain. IEEE Trans. on Software Engineering, 2019, 45(5): 489–506.
    [35] Franceschi JY, Dieuleveut A, Jaggi M. Unsupervised scalable representation learning for multivariate time series. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 418.
    [36] van den Oord A, Dieleman S, Zen HG, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K. WaveNet: A generative model for raw audio. In: Proc. of the 9th ISCA Speech Synthesis Workshop. Sunnyvale: ISCA, 2016. 125.
    [37] McInnes L, Healy J, Astels S. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2017, 2(11): 205.
    [38] Kim J, Feldt R, Yoo S. Guiding deep learning system testing using surprise adequacy. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 1039–1049. [doi: 10.1109/ICSE.2019.00108]
    [39] Feng Y, Shi QK, Gao XY, Wan J, Fang CR, Chen ZY. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 177–188. [doi: 10.1145/3395363.3397357]
    [40] Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V. CARLA: An open urban driving simulator. In: Proc. of the 1st Annual Conf. on Robot Learning. Mountain View: PMLR, 2017. 1–16.
    [41] Toromanoff M, Wirbel E, Moutarde F. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 7151–7160. [doi: 10.1109/CVPR42600.2020.00718]
    [42] The CARLA autonomous driving challenge. 2024. https://carlachallenge.org/
    [43] CARLA autonomous driving leaderboard. 2024. https://leaderboard.carla.org/leaderboard/
    [44] Kuznetsov A, Shvechikov P, Grishin A, Vetrov D. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In: Proc. of the 37th Int’l Conf. on Machine Learning. PMLR, 2020. 5556–5566.
    [45] Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6382–6393.
    [46] Klees G, Ruef A, Cooper B, Wei SY, Hicks M. Evaluating fuzz testing. In: Proc. of the 2018 ACM SIGSAC Conf. on Computer and Communications Security. Toronto: ACM, 2018. 2123–2138. [doi: 10.1145/3243734.3243804]
    [47] McLachlan GJ, Basford KE. Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker, 1988.
    [48] van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(86): 2579–2605.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

吴泊逾,王凯锐,王亚文,王俊杰.干扰惰性序列的连续决策模型模糊测试.软件学报,,():1-15

复制
分享
文章指标
  • 点击次数:49
  • 下载次数: 127
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2024-06-04
  • 最后修改日期:2024-08-07
  • 在线发布日期: 2025-03-26
文章二维码
您是第20055387位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号