深度学习框架测试研究综述

doi:10.13328/j.cnki.jos.007059

微信服务号

微信订阅号

2025年8月22日 23:55 星期五

首页 > 过刊浏览>2024年第35卷第8期 >3752-3784. DOI:10.13328/j.cnki.jos.007059

PDF HTML阅读 XML下载导出引用引用提醒

深度学习框架测试研究综述
DOI:
                        10.13328/j.cnki.jos.007059
                    
CSTR:
                        
                    
作者:
                        马祥跃马祥跃
北京航空航天大学 自动化科学与电气工程学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
杜晓婷杜晓婷
北京邮电大学 计算机学院(国家示范性软件学院), 北京 100876
在期刊界中查找
在百度中查找
在本站中查找
采青采青
北京航空航天大学 自动化科学与电气工程学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
郑阳郑阳
华为技术有限公司 可信理论、技术与工程实验室, 广东 深圳 518129
在期刊界中查找
在百度中查找
在本站中查找
胡崝胡崝
华为技术有限公司 可信理论、技术与工程实验室, 广东 深圳 518129
在期刊界中查找
在百度中查找
在本站中查找
郑征郑征
北京航空航天大学 自动化科学与电气工程学院, 北京 100191
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:马祥跃(1997－), 男, 博士生, 主要研究领域为智能软件测试;杜晓婷(1990－), 女, 博士, 讲师, CCF专业会员, 主要研究领域为智能软件测试, 软件仓库挖掘, 缺陷预测;采青(2000－), 女, 硕士生, 主要研究领域为软件测试;郑阳(1991－), 男, 博士, 高级工程师, 主要研究领域为人工智能系统的测试, 监测和修复研究, 数据驱动测试;胡崝(1981－), 男, 博士, 高级工程师, CCF专业会员, 主要研究领域为可信人工智能, 软件可靠性;郑征(1980－), 男, 博士, 教授, CCF专业会员, 主要研究领域为人工智能软件系统的可靠性及测试方法.
通讯作者:郑征, E-mail: zhengz@buaa.edu.cn
中图分类号:
基金项目:国家自然科学基金(61772055, 61872169); 中央高校基本科研业务费专项资金(2023RC06)

Survey on Testing of Deep Learning Frameworks

Author:

MA Xiang-Yue
MA Xiang-Yue
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
DU Xiao-Ting
DU Xiao-Ting
School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
在期刊界中查找
在百度中查找
在本站中查找
CAI Qing
CAI Qing
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
ZHENG Yang
ZHENG Yang
Trustworthiness Theory, Technology & Engineering Lab, Huawei Technologies Co. Ltd., Shenzhen 518129, China
在期刊界中查找
在百度中查找
在本站中查找
HU Zheng
HU Zheng
Trustworthiness Theory, Technology & Engineering Lab, Huawei Technologies Co. Ltd., Shenzhen 518129, China
在期刊界中查找
在百度中查找
在本站中查找
ZHENG Zheng
ZHENG Zheng
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [125]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着大数据和计算能力的快速发展, 深度学习技术取得巨大突破, 并迅速成为一个具有众多实际应用场景和活跃研究课题的领域. 为了满足日益增长的深度学习任务开发需求, 深度学习框架应运而生. 深度学习框架作为连接应用场景和硬件平台的中间部件, 向上支撑深度学习应用的开发, 帮助用户快速构造不同的深度神经网络模型, 向下深度适配各类计算硬件, 满足不同算力架构和环境下的计算需求. 作为人工智能领域的关键基础软件, 深度学习框架中一旦存在问题, 即使是一个只有几行代码的缺陷都可能导致在其基础上构造的模型发生大规模失效, 严重威胁深度学习系统安全. 作为以深度学习框架测试为主题的研究性综述, 首先对深度学习框架发展历程和基本架构进行介绍; 其次, 通过对55篇与深度学习框架测试研究直接相关的学术论文进行梳理, 对深度学习框架缺陷特性、测试关键技术和基于不同测试输入形式的测试方法这3个方面进行系统分析和总结; 针对不同测试输入形式的特点, 重点探究如何结合测试关键技术来解决研究问题; 最后对深度学习框架测试尚未解决的难点问题进行总结以及对未来值得探索的研究方向进行展望. 可以为深度学习框架测试研究领域的相关人员提供参考和帮助, 推动深度学习框架的不断发展成熟.

关键词:深度学习框架;测试;缺陷;实证研究

Abstract:

As big data and computing power rapidly develop, deep learning has made significant breakthroughs and rapidly become a field with numerous practical application scenarios and active research topics. In response to the growing demand for the development of deep learning tasks, deep learning frameworks have arisen. Acting as an intermediate component between application scenarios and hardware platforms, deep learning frameworks facilitate the development of deep learning applications, enabling users to efficiently construct diverse deep neural network (DNN) models, and deeply adapt to various computing hardware, meeting the computational needs across different computing architectures and environments. Any issues that arise within deep learning frameworks, which serve as the fundamental software in the realm of artificial intelligence, can have severe consequences. Even a single bug in the code can trigger widespread failures within models built upon the framework, thereby posing a serious threat to the safety of deep learning systems. As a review exclusively focuses on the testing of deep learning frameworks, this study initially introduces the developmental history and basic architectures of deep learning frameworks. Subsequently, by systematically examining 55 academic papers directly related to the testing of deep learning frameworks, the study systematically analyzes and summarizes bug characteristics, key technologies for testing, and methods based on various input forms for testing. The study explores how to combine key technologies to address research problems. Lastly, it summarizes the unresolved difficulties in the testing of deep learning frameworks and provides insights into promising research directions for the future. This study can offer valuable references and guidance to individuals involved in the research field of deep learning framework testing, ultimately promoting the sustained development and maturity of deep learning frameworks.

Key words:deep learning (DL) framework;testing;bug;empirical study

参考文献

[1] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504–507.

[2] Deng L. Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Processing Magazine, 2018, 35(1): 180–177.

[3] Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016.

[4] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin ZM, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In: Proc. of the 31st Conf. on Neural Information Processing Systems. Long Beach: NIPS, 2017. 1–4.

[5] Huawei Technologies Co. Ltd. Huawei mindspore AI development framework. In: Proc. of the 2023 Artificial Intelligence Technology. Singapore: Springer, 2023. 137–162.

[6] Taigman Y, Yang M, Ranzato MA, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1701–1708.

[7] Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567, 2014.

[8] Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Research, 2019, 43(4): 244–252.

[9] Ren XZ, Zhou PY, Meng XF, Huang XJ, Wang YD, Wang WC, Li PF, Zhang XD, Podolskiy A, Arshinov G, Bout A, Piontkovskaya I, Wei JS, Jiang X, Su T, Liu Q, Yao J. PanGu-Σ: Towards trillion parameter language model with sparse heterogeneous computing. arXiv:2303.10845, 2023.

[10] The Verge. A Google self-driving car caused a crash for the first time. 2016. https://www.theverge.com/2016/2/29/11134344/google-self-driving-car-crash-report

[11] CVE-2021-43811: Sockeye is an open-source sequence-to-sequence framework for neural machine translation built on PyTorch. Sockeye uses YAML. 2021. https://www.cvedetails.com/cve/CVE-2021-43811/

[12] MathWorks. Deep learning toolbox. 2023. https://www.mathworks.com/products/deep-learning.html

[13] OpenNN: Open neural networks library. 2023. https://www.opennn.net/

[14] Torch: Scientific computing for LuaJIT. 2023. http://torch.ch/

[15] Al-Rfou R, Alain G, Almahairi A, et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688, 2016.

[16] Jia YQ, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In: Proc. of the 22nd ACM Int’l Conf. on Multimedia. Orlando: ACM, 2014. 675–678.

[17] Tokui S, Oono K, Hido S, Clayton J. Chainer: A next-generation open source framework for deep learning. arXiv:1908.00213, 2019.

[18] Caffe2. 2023. http://caffe2.ai/

[19] Seide F, Agarwal A. CNTK: Microsoft’s open-source deep-learning toolkit. In: Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 2135.

[20] Chen TQ, Li M, Li YT, Lin M, Wang NY, Wang MJ, Xiao TJ, Xu B, Zhang CY, Zhang Z. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274, 2015.

[21] Keras: The Python deep learning API. 2023. https://keras.io/

[22] Levental M, Orlova E. Comparing the costs of abstraction for DL frameworks. arXiv:2012.07163, 2020.

[23] GitHub. google/jax: Composable transformations of Python+numpy programs: Differentiate, vectorize, JIT to GPU/TPU, and more. 2023. https://github.com/google/jax

[24] Chen JJ, Liang YH, Shen QC, Jiang JJ, Li SC. Toward understanding deep learning framework bugs. ACM Trans. on Software Engineering and Methodology, 2023, 32(6): 135.

[25] Wohlin C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proc. of the 18th Int’l Conf. on Evaluation and Assessment in Software Engineering. London: ACM, 2014. 38.

[26] 胡璇, 陈俊名, 李海峰. 基于本体的软件安全漏洞模式研究. 北京航空航天大学学报, 2022.

Hu X, Chen JM, Li HF. Researches on software security vulnerability pattern based on ontology. Journal of Beijing University of Aeronautics and Astronautics, 2022 (in Chinese with English abstract).

[27] GitHub. Let’s build from here. 2023. https://github.com/

[28] Stack Overflow. Empowering the world to develop technology through collective knowledge. 2023. https://stackoverflow.co/

[29] CVE. 2023. https://cve.mitre.org/

[30] Long GM, Chen T, Cosma G. Multifaceted hierarchical report identification for non-functional bugs in deep learning frameworks. In: Proc. of the 29th Asia-Pacific Software Engineering Conf. Japan: IEEE, 2022. 289–298.

[31] Yang ZC, Yang DY, Dyer C, He XD, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: Association for Computational Linguistics, 2016. 1480–1489.

[32] Du XT, Sui YL, Liu ZH, Ai J. An empirical study of fault triggers in deep learning frameworks. IEEE Trans. on Dependable and Secure Computing, 2023, 20(4): 2696–2712.

[33] Han JW, Kamber M, Pei J. Data Mining: Concepts and Techniques. 3rd ed., Boston: Morgan Kaufmann, 2012. 243–278.

[34] Zar JH. Spearman rank correlation. In: Armitage P, ed. Encyclopedia of Biostatistics. 2nd ed., Hoboken: John Wiley & Sons, Ltd., 2005. 7.

[35] Islam MJ, Nguyen G, Pan R, Rajan H. A comprehensive study on deep learning bug characteristics. In: Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Tallinn: ACM, 2019. 510–520.

[36] Du XT, Xiao GP, Sui YL. Fault triggers in the TensorFlow framework: An experience report. In: Proc. of the 31st IEEE Int’l Symp. on Software Reliability Engineering. Coimbra: IEEE, 2020. 1–12.

[37] Jia L, Zhong H, Wang XY, Huang LP, Lu XS. The symptoms, causes, and repairs of bugs inside a deep learning library. Journal of Systems and Software, 2021, 177: 110935.

[38] Yang YL, He TX, Xia ZL, Feng Y. A comprehensive empirical study on bug characteristics of deep learning frameworks. Information and Software Technology, 2022, 151: 107004.

[39] Quan LL, Guo QY, Xie XF, Chen S, Li XH, Liu Y. Towards understanding the faults of JavaScript-based deep learning systems. In: Proc. of the 37th IEEE/ACM Int’l Conf. on Automated Software Engineering. Rochester: ACM, 2022. 105.

[40] TensorFlow.js. Machine learning for JavaScript developers. 2023. https://www.tensorflow.org/js

[41] Grottke M, Trivedi KS. Software faults, software aging and software rejuvenation(<special survey>new development of software reliability engineering). The Journal of Reliability Engineering Association of Japan, 2005, 27(7): 425–438.

[42] Cao JM, Chen BH, Sun C, Hu LJ, Wu SH, Peng X. Understanding performance problems in deep learning systems. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 357–369.

[43] Makkouk T, Kim DJ, Chen THP. An empirical study on performance bugs in deep learning frameworks. In: Proc. of the 2022 IEEE Int’l Conf. on Software Maintenance and Evolution. Limassol: IEEE, 2022. 35–46.

[44] Long GM, Chen T. On reporting performance and accuracy bugs for deep learning frameworks: An exploratory study from GitHub. In: Proc. of the 26th Int’l Conf. on Evaluation and Assessment in Software Engineering. Gothenburg: ACM, 2022. 90–99.

[45] Kloberdanz E, Kloberdanz KG, Le W. DeepStability: A study of unstable numerical methods and their solutions in deep learning. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 586–597.

[46] Tambon F, Nikanjam A, An L, Khomh F, Antoniol G. Silent bugs in deep learning frameworks: An empirical study of Keras and TensorFlow. arXiv:2112.13314, 2021.

[47] Ren Y, Gay G, Kästner C, Jamshidi P. Understanding the nature of system-related issues in machine learning frameworks: An exploratory study. arXiv:2005.06091, 2020.

[48] Liu ZH, Zheng Y, Du XT, Hu Z, Ding WJ, Miao YM, Zheng Z. Taxonomy of aging-related bugs in deep learning libraries. In: Proc. of the 33rd IEEE Int’l Symp. on Software Reliability Engineering. Charlotte: IEEE, 2022. 423–434.

[49] Huang KF, Chen BH, Wu SS, Cao JM, Ma L, Peng X. Demystifying dependency bugs in deep learning stack. arXiv:2207.10347, 2022.

[50] Xiao QX, Li K, Zhang DY, Xu WL. Security risks in deep learning implementations. In: Proc. of the 2018 IEEE Security and Privacy Workshops. San Francisco: IEEE, 2018. 123–128.

[51] Chen HS, Zhang YP, Cao YR, Xie J. Security issues and defensive approaches in deep learning frameworks. Tsinghua Science and Technology, 2021, 26(6): 894–905.

[52] Harzevili NS, Shin J, Wang JJ, Wang S, Nagappan N. Characterizing and understanding software security vulnerabilities in machine learning libraries. In: Proc. of the 20th IEEE/ACM Int’l Conf. on Mining Software Repositories. Melbourne: IEEE, 2023. 27–38.

[53] Filus K, Domańska J. Software vulnerabilities in TensorFlow-based deep learning applications. Computers & Security, 2023, 124: 102948.

[54] Shirey R. RFC4949 Internet security glossary. IETF, 2007.

[55] Paudice A, Muñoz-González L, Gyorgy A, Lupu EC. Detection of adversarial training examples in poisoning attacks through anomaly detection. arXiv:1802.03041, 2018.

[56] Yerlikaya FA, Bahtiyar Ş. Data poisoning attacks against machine learning algorithms. Expert Systems with Applications, 2022, 208: 118101.

[57] CWE. Common weakness enumeration. 2023. https://cwe.mitre.org/

[58] Sun XB, Zhou TC, Li GJ, Hu JJ, Yang H, Li B. An empirical study on real bugs for machine learning programs. In: Proc. of the 24th Asia-Pacific Software Engineering Conf. Nanjing: IEEE, 2017. 348–357.

[59] Zhang TY, Gao CY, Ma L, Lyu M, Kim M. An empirical study of common challenges in developing deep learning applications. In: Proc. of the 30th IEEE Int’l Symp. on Software Reliability Engineering. Berlin: IEEE, 2019. 104–115.

[60] Liu JK, Huang Q, Xia X, Shihab E, Lo D, Li SP. Is using deep learning frameworks free?: Characterizing technical debt in deep learning frameworks. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering: Software Engineering in Society. Seoul: ACM, 2020. 1–10.

[61] 牛胜杰, 李鹏, 张玉杰. 模糊测试技术研究综述. 计算机工程与科学, 2022, 44(12): 2173–2186.

Niu SJ, Li P, Zhang YJ. Survey on fuzzy testing technologies. Computer Engineering and Science, 2022, 44(12): 2173–2186 (in Chinese with English abstract).

[62] Boehme M, Cadar C, Roychoudhury A. Fuzzing: Challenges and reflections. IEEE Software, 2021, 38(3): 79–86.

[63] Liu JW, Peng JJ, Wang YY, Zhang LM. NeuRI: Diversifying DNN generation via inductive rule inference. arXiv:2302.02261, 2023.

[64] Xie DN, Li YT, Kim M, Pham HV, Tan L, Zhang XY, Godfrey MW. DocTer: Documentation-guided fuzzing for testing deep learning API functions. In: Proc. of the 31st ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. New York: ACM, 2022. 176–188.

[65] Wang Z, Yan M, Chen JJ, Liu S, Zhang DD. Deep learning library testing via effective model generation. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. ACM, 2020. 788–799.

[66] Wei AJ, Deng YL, Yang CY, Zhang LM. Free lunch for testing: Fuzzing deep-learning libraries from open source. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 995–1007.

[67] Barr ET, Harman M, McMinn P, Shahbaz M, Yoo S. The oracle problem in software testing: A survey. IEEE Trans. on Software Engineering, 2015, 41(5): 507–525.

[68] Nejadgholi M, Yang JQ. A study of oracle approximations in testing deep learning libraries. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering. San Diego: IEEE, 2019. 785–796.

[69] Chen TY, Kuo FC, Liu H, Poon PL, Towey D, Tse TH, Zhou ZQ. Metamorphic testing: A review of challenges and opportunities. ACM Computing Surveys, 2019, 51(1): 4.

[70] Ding JH, Kang XJ, Hu XH. Validating a deep learning framework by metamorphic testing. In: Proc. of the 2nd IEEE/ACM Int’l Workshop on Metamorphic Testing. Buenos Aires: IEEE, 2017. 28–34.

[71] McKeeman WM. Differential testing for software. Digital Technical Journal, 1998, 10(1): 100–107.

[72] Pham HV, Lutellier T, Qi WZ, Tan L. CRADLE: Cross-backend validation to detect and localize bugs in deep learning libraries. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 1027–1038.

[73] Wang JN, Lutellier T, Qian SS, Pham HV, Tan L. EAGLE: Creating equivalent graphs to test deep learning libraries. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 798–810.

[74] Prochnow A, Yang JQ. DiffWatch: Watch out for the evolving differential testing in deep learning libraries. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering: Companion Proc. Pittsburgh: ACM, 2022. 46–50.

[75] Goodenough JB, Gerhart SL. Toward a theory of test data selection. ACM SIGPLAN Notices, 1975, 10(6): 493–510.

[76] Pei KX, Cao YZ, Yang JF, Jana S. DeepXplore: Automated whitebox testing of deep learning systems. In: Proc. of the 26th Symp. on Operating Systems Principles. Shanghai: ACM, 2017. 1–18.

[77] Li MZN, Cao JL, Tian YQ, Li TO, Wen M, Cheung SC. COMET: Coverage-guided model generation for deep learning library testing. ACM Trans. on Software Engineering and Methodology, 2023, 32(5): 127.

[78] Luo WS, Chai D, Ruan XY, Wang J, Fang CR, Chen ZY. Graph-based fuzz testing for deep learning inference engines. In: Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering. Madrid: IEEE, 2021. 288–299.

[79] Wu JW, Li SY, Li JQ, Luo L, Yu HF, Sun G. DeepCov: Coverage guided deep learning framework fuzzing. In: Proc. of the 7th IEEE Int’l Conf. on Data Science in Cyberspace. Guilin: IEEE, 2022. 399–404.

[80] Gu JZ, Luo XC, Zhou YF, Wang X. Muffin: Testing deep learning libraries via neural architecture fuzzing. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1418–1430.

[81] Kang HJ, Rattanukul P, Haryono SA, Nguyen TG, Ragkhitwetsagul C, Pasareanu C, Lo D. SkipFuzz: Active learning-based input selection for fuzzing deep learning libraries. arXiv:2212.04038, 2022.

[82] Deng YL, Yang CY, Wei AJ, Zhang LM. Fuzzing deep-learning libraries via automated relational API inference. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 44–56.

[83] Deng YL, Xia CS, Peng HR, Yang CY, Zhang LM. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023. 423–435.

[84] Deng YL, Xia CS, Yang CY, Zhang SD, Yang SJ, Zhang LM. Large language models are edge-case fuzzers: Testing deep learning libraries via FuzzGPT. arXiv:2304.02014, 2023.

[85] Jia Y, Harman M. An analysis and survey of the development of mutation testing. IEEE Trans. on Software Engineering, 2011, 37(5): 649–678.

[86] Jia L, Zhong H, Huang LP. The unit test quality of deep learning libraries: A mutation analysis. In: Proc. of the 2021 IEEE Int’l Conf. on Software Maintenance and Evolution. Luxembourg: IEEE, 2021. 47–57.

[87] Guo QY, Xie XF, Li Y, Zhang XY, Liu Y, Li XH, Shen C. Audee: Automated testing for deep learning frameworks. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. Melbourne: ACM, 2020. 486–498.

[88] Zou YL, Sun HF, Fang CR, Liu JW, Zhang ZP. Deep learning framework testing via hierarchical and heuristic model generation. Journal of Systems and Software, 2023, 201: 111681.

[89] Shen XZ, Zhang JY, Wang XN, Yu HF, Sun G. Deep learning framework fuzzing based on model mutation. In: Proc. of the 6th IEEE Int’l Conf. on Data Science in Cyberspace. Shenzhen: IEEE, 2021. 375–380.

[90] Li JQ, Li SY, Wu JW, Luo L, Bai Y, Yu HF. MMOS: Multi-staged mutation operator scheduling for deep learning library testing. In: Proc. of the 2022 IEEE Global Communications Conf. Rio de Janeiro: IEEE, 2022. 6103–6108.

[91] Schumi R, Sun J. ExAIS: Executable AI semantics. In: Proc. of the 44th Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 859–870.

[92] Lipowski A, Lipowska D. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications, 2012, 391(6): 2193–2196.

[93] Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933, 25(3–4): 285–294.

[94] Lau TA, Weld DS. Programming by demonstration: An inductive learning formulation. In: Proc. of the 4th Int’l Conf. on Intelligent User Interfaces. Los Angeles: ACM, 1998. 145–152.

[95] Bramer M. Logic Programming with Prolog. 2nd ed., London: Springer, 2013.

[96] TensorFlow. Module: Tf. keras. layers. TensorFlow v2.11.0. 2023. https://www.tensorflow.org/api_docs/python/tf/keras/layers

[97] Bengio Y. Learning deep architectures for AI. Foundations and Trends^® in Machine Learning, 2009, 2(1): 1–127.

[98] 谷典典, 石屹宁, 刘譞哲, 吴格, 姜海鸥, 赵耀帅, 马郓. 基于元算子的深度学习框架缺陷检测方法. 计算机学报, 2022, 45(2): 240–255.

Gu DD, Shi YN, Liu XZ, Wu G, Jiang HO, Zhao YS, Ma Y. Defect detection for deep learning frameworks based on meta operators. Chinese Journal of Computers, 2022, 45(2): 240–255 (in Chinese with English abstract).

[99] Elsken T, Metzen JH, Hutter F. Neural architecture search: A survey. Journal of Machine Learning Research, 2019, 20(55): 1–21.

[100] Roberts F, Tesman B. Introduction to graph theory. In: Roberts F, Tesman B, eds. Applied Combinatorics. 2nd ed., New York: Chapman and Hall/CRC, 2009.

[101] Xu WW, Chen ST, Han GX, Yu N, Xu H. A Monte Carlo tree search-based method for decision making of generator serial restoration sequence. Frontiers in Energy Research, 2023, 10: 1007914.

[102] TensorFlow. Tf.nn.conv2d: TensorFlow v2.14.0. 2023. https://www.tensorflow.org/api_docs/python/tf/nn/conv2d

[103] Christou N, Jin D, Atlidakis V, Ray B, Kemerlis VP. IvySyn: Automated vulnerability discovery for deep learning frameworks. In: Proc. of the 32nd USENIX Conf. on Security Symp. Anaheim: USENIX Association, 2023. 2383–2400.

[104] Yang CY, Deng YL, Yao JY, Tu YX, Li HC, Zhang LM. Fuzzing automatic differentiation in deep-learning libraries. In: Proc. of the 45th IEEE/ACM Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 1174–1186.

[105] Shi JY, Xiao Y, Li YK, Li YT, Yu DS, Yu CD, Su H, Chen YF, Huo W. ACETest: Automated constraint extraction for testing deep learning operators. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023. 690–702.

[106] Zhang XF, Sun N, Fang CR, Liu JW, Liu J, Chai D, Wang J, Chen ZY. Predoo: Precision testing of deep learning operators. In: Proc. of the 30th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2021. 400–412.

[107] Zhang XF, Liu JW, Sun N, Fang CR, Liu J, Wang J, Chai D, Chen ZY. Duo: Differential fuzzing for deep learning operators. IEEE Trans. on Reliability, 2021, 70(4): 1671–1685.

[108] Zhao WX, Zhou K, Li JY, et al. A survey of large language models. arXiv:2303.18223, 2023.

[109] Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv:2107.03374, 2021.

[110] Beyer HG, Schwefel HP. Evolution strategies—A comprehensive introduction. Natural Computing, 2002, 1(1): 3–52.

[111] Fried D, Aghajanyan A, Lin J, Wang SD, Wallace E, Shi F, Zhong RQ, Yih W, Zettlemoyer L, Lewis M. InCoder: A generative model for code infilling and synthesis. In: Proc. of the 11th Int’l Conf. on Learning Representations. Kigali: OpenReview.net, 2023.

[112] Cambronero JP, Dang THY, Vasilakis N, Shen JS, Wu J, Rinard MC. Active learning for software engineering. In: Proc. of the 2019 ACM SIGPLAN Int’l Symp. on New Ideas, New Paradigms, and Reflections on Programming and Software. Athens: ACM, 2019. 62–78.

[113] Zaki MJ. TREEMINER: An efficient algorithm for mining embedded ordered frequent trees. In: Bandyopadhyay S, Maulik U, Holder LB, Cook DJ, eds. Advanced Methods for Knowledge Discovery from Complex Data. London: Springer, 2005. 123–151.

[114] Chiang WF, Gopalakrishnan G, Rakamaric Z, Solovyev A. Efficient search for inputs causing high floating-point errors. In: Proc. of the 19th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. Orlando: ACM, 2014. 43–52.

[115] Yin PC, Li WD, Xiao KF, Rao A, Wen YM, Shi KS, Howland J, Bailey P, Catasta M, Michalewski H, Polozov O, Sutton C. Natural language to code generation in interactive data science notebooks. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto: Association for Computational Linguistics, 2023. 126–173.

[116] Fan ZY, Gao X, Mirchev M, Roychoudhury A, Tan SH. Automated repair of programs from large language models. In: Proc. of the 45th Int’l Conf. on Software Engineering. Melbourne: IEEE, 2023. 1469–1481.

[117] White J, Hays S, Fu QC, Spencer-Smith J, Schmidt DC. ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. arXiv:2303.07839, 2023.

[118] Le Quoc D, Gregor F, Arnautov S, Kunkel R, Bhatotia P, Fetzer C. SecureTF: A secure TensorFlow framework. In: Proc. of the 21st Int’l Middleware Conf. Delft: ACM, 2020. 44–59.

[119] LatticeX-foundation/rosetta: A privacy-preserving framework based on TensorFlow. 2023. https://github.com/LatticeX-Foundation/Rosetta

[120] Extreme speed and scale for DL training and inference. 2023. https://github.com/microsoft/DeepSpeed

[121] 葛建, 虞慧群, 范贵生, 唐锏浩, 黄子杰. 面向智能计算框架的即时缺陷预测. 软件学报, 2023, 34(9): 3966–3980. http://www.jos.org.cn/1000-9825/6874.htm

Ge J, Yu HQ, Fan GS, Tang JH, Huang ZJ. Just-in-time defect prediction for intelligent computing frameworks. Ruan Jian Xue Bao/Journal of Software, 2023, 34(9): 3966–3980 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6874.htm

引用本文

马祥跃,杜晓婷,采青,郑阳,胡崝,郑征.深度学习框架测试研究综述.软件学报,2024,35(8):3752-3784

复制

文章指标

点击次数:1435
下载次数: 4157
HTML阅读次数: 1249
引用次数: 0

历史

收稿日期:2023-05-04
最后修改日期:2023-07-03
录用日期:
在线发布日期: 2024-01-24
出版日期: 2024-08-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码