DeepRanger:覆盖制导的深度森林测试方法
作者:
作者简介:

崔展齐(1984-),男,博士,副教授,CCF高级会员,主要研究领域为软件测试及分析,智能软件工程;谢瑞麟(1996-),男,硕士生,CCF学生会员,主要研究领域为智能软件工程;陈翔(1980-),男,博士,副教授,CCF高级会员,主要研究领域为软件缺陷预测,软件缺陷定位,回归测试,组合测试;刘秀磊(1981-),男,博士,教授,CCF专业会员,主要研究领域为语义Web,本体匹配,语义搜索,语义Sensor,知识图谱;郑丽伟(1979-),男,博士,副教授,CCF专业会员,主要研究领域为需求工程,群体协同,大数据挖掘

通讯作者:

崔展齐,czq@bistu.edu.cn;陈翔,xchencs@ntu.edu.cn

中图分类号:

TP311

基金项目:

江苏省前沿引领技术基础研究专项(BK20202001);国家自然科学基金(61702041,61601039);北京信息科技大学“勤信人才”培育计划(QXTCPC201906,QXTCPB201905)


DeepRanger: Coverage-guided Deep Forest Testing Approach
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [33]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    深度学习软件的结构特征与传统软件存在明显差异,因此即使展开了大量测试,依然无法有效衡量测试数据对深度学习软件的覆盖情况和测试充分性,并造成后续使用过程中依然可能存在大量未知错误.深度森林是一种新型深度学习模型,其克服了深度神经网络存在的一些缺点,例如:需要大量训练数据、需要高算力平台、需要大量超参数.但目前还没有相关工作对深度森林的测试方法进行研究.针对深度森林的结构特点,制定了一组由随机森林结点覆盖率RFNC、随机森林叶子覆盖率RFLC、级联森林类型覆盖率CFCC和级联森林输出覆盖率CFOC组成的测试覆盖率评价指标.在此基础上,基于遗传算法设计了覆盖制导的测试数据自动生成方法DeepRanger,可自动生成能有效提高模型覆盖率的测试数据集.为对所提出覆盖指标的有效性进行验证,在深度森林开源项目gcForest和MNIST数据集上设计并进行了一组实验.实验结果表明,所提出的4种覆盖指标均能有效评价测试数据集对深度森林模型的测试充分性.此外,与基于随机选择的遗传算法相比,使用覆盖信息制导的测试数据生成方法DeepRanger能达到更高的模型覆盖率.

    Abstract:

    Comparing with traditional software, the deep learning software has different structures. Even if a lot of test data is used for testing the deep learning software, the adequacy of testing still hard to be evaluted, and many unknown defects could be implied. The deep forest is an emerging deep learning model that overcomes many shortcomings of deep neural networks. For example, the deep neural network requires a lot of training data, high performance computing platform, and many hyperparameters. However, there is no research on testing deep forest. Based on the structural characteristics of deep forests, this study proposes a set of testing coverage criteria, including random forest node coverage (RFNC), random forest leaf coverage (RFLC), cascad forest class coverage (CFCC), and cascad forest output coverage (CFOC). DeepRanger, a coverage-oriented test data generation method based on genetic algorithm, is proposed to automatically generate new test data and effectively improve the model coverage of the test data. Experiments are carried out on the MNIST data set and the gcForest, which is an open source deep forest project. The experimental results show that the four coverage criteria proposed can effectively evaluate the adequacy of the test data set for the deep forest model. In addition, comparing with the genetic algorithm based on random selection, DeepRanger, which is guided by coverage information, can improve the testing coverage of the deep forest model under testing.

    参考文献
    [1] 余凯, 贾磊, 陈雨强, 徐伟. 深度学习的昨天、今天和明天. 计算机研究与发展, 2013, 50(9): 1799–1804.
    Yu K, Jia L, Chen YQ, Xu W. Deep learning: Yesterday, today, and tomorrow. Journal of Computer Research and Development, 2013, 50(9): 1799–1804 (in Chinese with English abstract).
    [2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436–444. [doi: 10.1038/nature14539]
    [3] Du XN, Xie XF, Li Y, Ma L, Liu Y, Zhao JJ. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In: Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Tallinn: ACM, 2019. 477–487.
    [4] Ziegler C. A Google self-driving car caused a crash for the first time: A bad assumption led to a minor fender-bender. 2016. http://www.theverge.com/2016/2/29/11134344/google-selfdriving-car-crash-report
    [5] BBC NEWS. Tesla autopilot crash driver “Was Playing Video Game”. 2020. https://www.bbc.com/news/technology-51645566
    [6] Ma L, Juefei-Xu F, Zhang FY, Sun JY, Xue MH, Li B, Chen CY, Su T, Li L, Liu Y, Zhao JJ, Wang YD. DeepGauge: Multi-granularity testing criteria for deep learning systems. In: Proc. of the 33rd ACM/IEEE Int’l Conf. on Automated Software Engineering. Montpellier: ACM, 2018. 120–131.
    [7] Xie XF, Ma L, Juefei-Xu F, Xue MH, Chen HX, Liu Y, Zhao JJ, Li B, Yin JX, See S. DeepHunter: A coverage-guided fuzz testing framework for deep neural networks. In: Proc. of the 28th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Beijing: ACM, 2019. 146–157.
    [8] Zhou ZH, Feng J. Deep forest: Towards an alternative to deep neural networks. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: IJCAI, 2017. 3553–3559.
    [9] Zhou ZH, Feng J. Deep forest. National Science Review, 2019, 6(1): 74–86. [doi: 10.1093/nsr/nwy108]
    [10] Zhu H, Hall PAV, May JHR. Software unit test coverage and adequacy. ACM Computing Surveys, 1997, 29(4): 366–427. [doi: 10.1145/267580.267590]
    [11] Holland JH. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. Cambridge: MIT Press, 1992.
    [12] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324. [doi: 10.1109/5.726791]
    [13] 朱锐, 王怀民, 冯大为. 基于偏好推荐的可信服务选择. 软件学报, 2011, 22(5): 852–864. http://www.jos.org.cn/1000-9825/3804.htm
    Zhu R, Wang HM, Feng DW. Trustworthy services selection based on preference recommendation. Ruan Jian Xue Bao/Journal of Software, 2011, 22(5): 852–864 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3804.htm
    [14] Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann Publishers Inc., 2000.
    [15] 王赞, 闫明, 刘爽, 陈俊洁, 张栋迪, 吴卓, 陈翔. 深度神经网络测试研究综述. 软件学报, 2020, 31(5): 1255–1275. http://www.jos.org.cn/1000-9825/5951.htm
    Wang Z, Yan M, Liu S, Chen JJ, Zhang DD, Wu Z, Chen X. Survey on testing of deep neural networks. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5): 1255–1275 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5951.htm
    [16] Goodfellow I, Papernot N. The challenge of verification and testing of machine learning. 2017. http://www.cleverhans.io/security/privacy/ml/2017/06/14/verification.html
    [17] Tian YC, Pei KX, Jana S, Ray B. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018. 303–314.
    [18] Zhang MS, Zhang YQ, Zhang LM, Liu C, Khurshid S. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In: Proc. of the 33rd IEEE/ACM Int’l Conf. on Automated Software Engineering. Montpellier: IEEE, 2018. 132–142.
    [19] Wang JY, Dong GL, Sun J, Wang XY, Zhang PX. Adversarial sample detection for deep neural network through model mutation testing. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 1245–1256.
    [20] Hayhurst KJ. A Practical Tutorial on Modified Condition/Decision Coverage. DIANE Publishing, 2001.
    [21] Sun YC, Huang XW, Kroening D, Sharp J, Hill M, Ashmore R. Testing deep neural networks. arXiv:1803.04792, 2018.
    [22] Pei KX, Cao YZ, Yang JF, Jana S. DeepXplore: Automated whitebox testing of deep learning systems. In: Proc. of the 26th Symp. on Operating Systems Principles. Shanghai: ACM, 2017. 1–18.
    [23] Ma L, Juefei-Xu F, Xue MH, Li B, Li L, Liu Y, Zhao JJ. DeepCT: Tomographic combinatorial testing for deep learning systems. In: Proc. of the 26th IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering. Hangzhou: IEEE, 2019. 614–618.
    [24] Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego, 2015.
    [25] Kim J, Feldt R, Yoo S. Guiding deep learning system testing using surprise adequacy. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 1039–1049.
    [26] Li ZN, Ma XX, Xu C, Cao C. Structural coverage criteria for neural networks could be misleading. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering: New Ideas and Emerging Results. Montreal: IEEE, 2019. 89–92.
    [27] 卢喜东, 段哲民, 钱叶魁, 周巍. 一种基于深度森林的恶意代码分类方法. 软件学报, 2020, 31(5): 1454–1464. http://www.jos.org.cn/1000-9825/5660.htm
    Lu XD, Duan ZM, Qian YK, Zhou W. Malicious code classification method based on deep forest. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5): 1454–1464 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5660.htm
    [28] Zhou TC, Sun XB, Xia X, Li B, Chen X. Improving defect prediction with deep forest. Information and Software Technology, 2019, 114: 204–216. [doi: 10.1016/j.infsof.2019.07.003]
    [29] Xie RL, Cui ZQ, Jia MH, Wen Y, Hao BS. Testing coverage criteria for deep forests. In: Proc. of the 6th Int’l Conf. on Dependable Systems and Their Applications. Harbin: IEEE, 2020. 513–514.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

崔展齐,谢瑞麟,陈翔,刘秀磊,郑丽伟. DeepRanger:覆盖制导的深度森林测试方法.软件学报,2023,34(5):2251-2267

复制
分享
文章指标
  • 点击次数:810
  • 下载次数: 2438
  • HTML阅读次数: 1858
  • 引用次数: 0
历史
  • 收稿日期:2020-09-16
  • 最后修改日期:2021-01-15
  • 在线发布日期: 2022-09-16
  • 出版日期: 2023-05-06
文章二维码
您是第20435411位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号