面向深度学习的后门攻击及防御研究综述
作者:
中图分类号:

TP306

基金项目:

国家自然科学基金(62202238); 江苏省重点研发项目(BE2022065-5)


Survey on Backdoor Attacks and Defenses for Deep Learning Research
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [150]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    深度学习模型是人工智能系统的重要组成部分, 被广泛应用于现实多种关键场景. 现有研究表明, 深度学习的低透明度与弱可解释性使得深度学习模型对扰动敏感. 人工智能系统面临多种安全威胁, 其中针对深度学习的后门攻击是人工智能系统面临的重要威胁. 为了提高深度学习模型的安全性, 全面地介绍计算机视觉、自然语言处理等主流深度学习系统的后门攻击与防御研究进展. 首先根据现实中攻击者能力将后门攻击分为全过程可控后门、模型修改后门和仅数据投毒后门. 然后根据后门构建方式进行子类划分. 接着根据防御策略对象将现有后门防御方法分为基于输入的后门防御与基于模型的后门防御. 最后汇总后门攻击常用数据集与评价指标, 并总结后门攻击与防御领域存在的问题, 在后门攻击的安全应用场景与后门防御的有效性等方面提出建议与展望.

    Abstract:

    Deep learning models are integral components of artificial intelligence systems, widely deployed in various critical real-world scenarios. Research has shown that the low transparency and weak interpretability of deep learning models render them highly sensitive to perturbations. Consequently, artificial intelligence systems are exposed to multiple security threats, with backdoor attacks on deep learning models representing a significant concern. This study provides a comprehensive overview of the research progress on backdoor attacks and defenses in mainstream deep learning systems, including computer vision and natural language processing. Backdoor attacks are categorized based on the attacker’s capabilities into full-process controllable backdoors, model modification backdoors, and data poisoning backdoors, which are further classified according to the backdoor construction methods. Defense strategies are divided into input-based defenses and model-based defenses, depending on the target of the defensive measures. This study also summarizes commonly used datasets and evaluation metrics in this domain. Lastly, existing challenges in backdoor attack and defense research are discussed, alongside recommendations and future directions focusing on security application scenarios of backdoor attacks and the efficacy of defense mechanisms.

    参考文献
    [1] Anderljung M, Barnhart J, Korinek A, Leung J, O’Keefe C, Whittlestone J, Avin S, Brundage M, Bullock J, Cass-Beggs D, Chang B, Collins T, Fist T, Hadfield G, Hayes A, Ho L, Hooker S, Horvitz E, Kolt N, Schuett J, Shavit Y, Siddarth D, Trager R, Wolf K. Frontier AI regulation: Managing emerging risks to public safety. arXiv:2307.03718, 2023.
    [2] Chen XY, Liu C, Li B, Lu K, Song D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv:1712.05526, 2017.
    [3] 陈宇飞, 沈超, 王骞, 李琦, 王聪, 纪守领, 李康, 管晓宏. 人工智能系统安全与隐私风险. 计算机研究与发展, 2019, 56(10): 2135–2150.
    Chen YF, Shen C, Wang Q, Li Q, Wang C, Ji SL, Li K, Guan XH. Security and privacy risks in artificial intelligence system. Journal of Computer Research and Development, 2019, 56(10): 2135–2150 (in Chinese with English abstract).
    [4] Yan BC, Lan JH, Yan Z. Backdoor attacks against voice recognition systems: A survey. arXiv:2307.13643, 2023.
    [5] Chen YJ, Gong XL, Wang Q, Di X, Huang HY. Backdoor attacks and defenses for deep neural networks in outsourced cloud environments. IEEE Network, 2020, 34(5): 141–147.
    [6] 纪守领, 杜天宇, 李进锋, 沈超, 李博. 机器学习模型安全与隐私研究综述. 软件学报, 2021, 32(1): 41–67. http://www.jos.org.cn/1000-9825/6131.htm
    Ji SL, Du TY, Li JF, Shen C, Li B. Security and privacy of machine learning models: A survey. Ruan Jian Xue Bao/Journal of Software, 2021, 32(1): 41–67 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6131.htm
    [7] 全国信息安全标准化技术委员会. 人工智能安全标准化白皮书. 2023版. https://www.tc260.org.cn/upload/2023-05-31/1685501487351066337.pdf
    National Information Security Standardization Technical Committee. White paper on artificial intelligence safety standardization. 2023 (in Chinese). https://www.tc260.org.cn/upload/2023-05-31/1685501487351066337.pdf
    [8] Goodfello IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2015.
    [9] Li YD, Zhang SG, Wang WP, Song H. Backdoor attacks to deep learning models and countermeasures: A survey. IEEE Open Journal of the Computer Society, 2023, 4: 134–146.
    [10] Goldblum M, Tsipras D, Xie CL, Chen XY, Schwarzschild A, Song D, Madry A, Li B, Goldstein T. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1563–1580.
    [11] Gao YS, Doan BG, Zhang Z, Ma SQ, Zhang JL, Fu AM, Nepal S, Kim H. Backdoor attacks and countermeasures on deep learning: A comprehensive review. arXiv:2007.10760, 2020.
    [12] Wu BY, Zhu ZH, Liu L, Liu QS, He ZF, Lyu SW. Attacks in adversarial machine learning: A systematic survey from the life-cycle perspective. arXiv:2302.09457, 2024.
    [13] Omar M. Backdoor learning for NLP: Recent advances, challenges, and future research directions. arXiv:2302.06801, 2023.
    [14] Li YM, Jiang Y, Li ZF, Xia ST. Backdoor learning: A survey. IEEE Trans. on Neural Networks and Learning Systems, 2024, 35(1): 5–22.
    [15] 黄舒心, 张全新, 王亚杰, 张耀元, 李元章. 深度神经网络的后门攻击研究进展. 计算机科学, 2023, 50(9): 52–61.
    Huang SX, Zhang QX, Wang YJ, Zhang YY, Li YZ. Research progress of backdoor attacks in deep neural networks. Computer Science, 2023, 50(9): 52–61 (in Chinese with English abstract).
    [16] Li SF, Dong T, Zhao BZH, Xue MH, Du SG, Zhu HJ. Backdoors against natural language processing: A review. IEEE Security & Privacy, 2022, 20(5): 50–59.
    [17] 杜巍, 刘功申. 深度学习中的后门攻击综述. 信息安全学报, 2022, 7(3): 1–16.
    Du W, Liu GS. A survey of backdoor attack in deep learning. Journal of Cyber Security, 2022, 7(3): 1–16 (in Chinese with English abstract).
    [18] 郑明钰, 林政, 刘正宵, 付鹏, 王伟平. 文本后门攻击与防御综述. 计算机研究与发展, 2024, 61(1): 221–242.
    Zheng MY, Lin Z, Liu ZX, Fu P, Wang WP. Survey of textual backdoor attack and defense. Journal of Computer Research and Development, 2024, 61(1): 221–242 (in Chinese with English abstract).
    [19] 陈梦轩, 张振永, 纪守领, 魏贵义, 邵俊. 图像对抗样本研究综述. 计算机科学, 2022, 49(2): 92–106.
    Chen MX, Zhang ZY, Ji SL, Wei GY, Shao J. Survey of research progress on adversarial examples in images. Computer Science, 2022, 49(2): 92–106 (in Chinese with English abstract).
    [20] Pan XD, Zhang M, Sheng BN, Zhu JM, Yang M. Hidden trigger backdoor attack on NLP models via linguistic style manipulation. In: Proc. of the 31st USENIX Security Symp. Boston: USENIX Association, 2022. 3611–3628.
    [21] Gu TY, Dolan-Gavitt B, Garg S. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733, 2019.
    [22] Wang ZT, Zhai J, Ma SQ. BppAttack: Stealthy and efficient Trojan attacks against deep neural networks via image quantization and contrastive adversarial learning. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 15054–15063. [doi: 10.1109/CVPR52688.2022.01465]
    [23] Liu YF, Ma XJ, Bailey J, Lu F. Reflection backdoor: A natural backdoor attack on deep neural networks. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 182–199. [doi: 10.1007/978-3-030-58607-2_11]
    [24] Li SF, Xue MH, Zhao B, Zhu HJ, Zhang XP. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. on Dependable and Secure Computing, 2021, 18(5): 2088–2105.
    [25] Sun WL, Jiang XY, Dou SG, Li DS, Miao DQ, Deng C, Zhao CR. Invisible backdoor attack with dynamic triggers against person re-IDentification. IEEE Trans. on Information Forensics and Security, 2024, 19: 307–319.
    [26] Nguyen A, Tran A. WaNet-imperceptible warping-based backdoor attack. arXiv:2102.10369, 2021.
    [27] Lin JY, Xu L, Liu YQ, Zhang XY. Composite backdoor attack for deep neural network by mixing existing benign features. In: Proc. of the 2020 ACM SIGSAC Conf. on Computer and Communications Security. Virtual Event: ACM, 2020. 113–131. [doi: 10.1145/3372297.3423362]
    [28] Sarkar E, Benkraouda H, Krishnan G, Gamil H, Maniatakos M. FaceHack: Attacking facial recognition systems using malicious facial characteristics. IEEE Trans. on Biometrics, Behavior, and Identity Science, 2022, 4(3): 361–372.
    [29] Zhong HT, Liao C, Squicciarini AC, Zhu SC, Miller D. Backdoor embedding in convolutional neural network models via invisible perturbation. In: Proc. of the Tenth ACM Conf. on Data and Application Security and Privacy. New Orleans: ACM, 2020. 97–108. [doi: 10.1145/3374664.3375751]
    [30] He Y, Shen ZL, Xia C, Hua JY, Tong W, Zhong S. SGBA: A stealthy scapegoat backdoor attack against deep neural networks. Computers & Security, 2024, 136: 103523.
    [31] Jia JY, Liu YP, Gong NZ. BadEncoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In: Proc. of the 2022 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2022. 2043–2059. [doi: 10.1109/SP46214.2022.9833644]
    [32] Carlini N, Terzis A. Poisoning and backdooring contrastive learning. arXiv:2106.09667, 2022.
    [33] Zeng Y, Park W, Mao ZM, Jia RX. Rethinking the backdoor attacks’ triggers: A frequency perspective. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 16453–16461. [doi: 10.1109/ICCV48922.2021.01616]
    [34] Xu ZQJ, Zhang YY, Xiao YY. Training behavior of deep neural network in frequency domain. In: 26th Int’l Conf. on Neural Information Processing. Sydney: Springer, 2019. 264–274. [doi: 10.1007/978-3-030-36708-4_22]
    [35] Xu ZQJ, Zhang YY, Luo T, Xiao YY, Ma Z. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv:1901.06523, 2024.
    [36] Gao YD, Chen HL, Sun P, Li JJ, Zhang AQ, Wang ZB, Liu WF. A dual stealthy backdoor: From both spatial and frequency perspectives. In: Proc. of the 38th AAAI Conf. on Artificial Intelligence. Vancouver: AAAI Press, 2024. 1851–1859. [doi: 10.1609/aaai.v38i3.27954]
    [37] Xia J, Yue ZH, Zhou YB, Ling ZW, Wei X, Chen MS. WaveAttack: Asymmetric frequency obfuscation-based backdoor attacks against deep neural networks. arXiv:2310.11595, 2023.
    [38] Feng Y, Ma BT, Zhang J, Zhao SS, Xia Y, Tao DC. FIBA: Frequency-Injection based backdoor attack in medical image analysis. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 20844–20853. [doi: 10.1109/CVPR52688.2022.02021]
    [39] Chen XY, Salem A, Chen DF, Backes M, Ma SQ, Shen QN, Wu ZH, Zhang Y. BadNL: Backdoor attacks against NLP models with semantic-preserving improvements. In: Proc. of the 37th Annual Computer Security Applications Conf. Virtual Event: ACM, 2021. 554–569. [doi: 10.1145/3485832.3485837]
    [40] Qi FC, Yao Y, Xu S, Liu ZY, Sun MS. Turn the combination lock: Learnable textual backdoor attacks via word substitution. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing, Vol. 1 (Long Papers). ACL, 2021. 4873–4883. [doi: 10.18653/v1/2021.acl-long.377]
    [41] Parhankangas A, Renko M. Linguistic style and crowdfunding success among social and commercial entrepreneurs. Journal of Business Venturing, 2017, 32(2): 215–236.
    [42] Dai JZ, Chen CS, Li YF. A backdoor attack against LSTM-based text classification systems. IEEE Access, 2019, 7: 138872–138878.
    [43] Yang WK, Lin YK, Li P, Zhou J, Sun X. Rethinking stealthiness of backdoor attack against NLP models. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing, Vol. 1 (Long Papers). ACL, 2021. 5543–5557. [doi: 10.18653/v1/2021.acl-long.431]
    [44] Qi FC, Li MK, Chen YY, Zhang ZY, Liu ZY, Wang YS, Sun MS. Hidden Killer: Invisible textual backdoor attacks with syntactic trigger. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing, Vol. 1 (Long Papers). ACL, 2021. 443–453. [doi: 10.18653/v1/2021.acl-long.37]
    [45] Zhou XK, Li JW, Zhang TW, Lyu LJ, Yang MQ, He J. Backdoor attacks with input-unique triggers in NLP. arXiv:2303.14325, 2023.
    [46] Chan A, Tay Y, Ong YS, Zhang A. Poison attacks against text datasets with conditional adversarially regularized autoencoder. In: Findings of the Association for Computational Linguistics: EMNLP 2020. ACL, 2020. 4175–4189. [doi: 10.18653/v1/2020.findings-emnlp.373]
    [47] Jin RN, Huang CY, You CY, Li XX. Backdoor attack on unpaired medical image-text foundation models: A pilot study on MedCLIP. arXiv:2401.01911, 2024.
    [48] Yao YS, Li HY, Zheng HT, Zhao BY. Latent backdoor attacks on deep neural networks. In: Proc. of the 2019 ACM SIGSAC Conf. on Computer and Communications Security. London: ACM, 2019. 2041–2055. [doi: 10.1145/3319535.3354209]
    [49] Shen LJ, Ji SL, Zhang XH, Li JF, Chen J, Shi J, Fang CF, Yin JW, Wang T. Backdoor pre-trained models can transfer to all. In: Proc. of the 2021 ACM SIGSAC Conf. on Computer and Communications Security. Virtual Event: ACM, 2021. 3141–3158. [doi: 10.1145/3460120.3485370]
    [50] Chen KJ, Meng YX, Sun XF, Guo SW, Zhang TW, Li JW, Fan C. BADPRE: Task-Agnostic backdoor attacks to pre-trained NLP foundation models. arXiv:2110.02467, 2021.
    [51] Liu MX, Zhang ZH, Zhang YM, Zhang C, Li Z, Li Q, Duan HX, Sun DH. Automatic generation of adversarial readable Chinese texts. IEEE Trans. on Dependable and Secure Computing, 2023, 20(2): 1756–1770.
    [52] Turner A, Tsipras D, Madry A. Label-Consistent backdoor attacks. arXiv:1912.02771, 2019.
    [53] Saha A, Subramanya A, Pirsiavash H. Hidden trigger backdoor attacks. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI Press, 2020. 11957–11965. [doi: 10.1609/aaai.v34i07.6871]
    [54] Ning R, Li J, Xin CS, Wu HY. Invisible poison: A blackbox clean label backdoor attack to deep neural networks. In: Proc. of the 2021 IEEE Conf. on Computer Communications. Vancouver: IEEE, 2021. 1–10. [doi: 10.1109/INFOCOM42981.2021.9488902]
    [55] Tan TJL, Shokri R. Bypassing backdoor detection algorithms in deep learning. In: Proc. of the 2020 IEEE European Symp. on Security and Privacy. Genoa: IEEE, 2020. 175–183. [doi: 10.1109/EuroSP48549.2020.00019]
    [56] Zhao SH, Ma XJ, Zheng X, Bailey J, Chen JJ, Jiang YG. Clean-label backdoor attacks on video recognition models. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 14431–14440. [doi: 10.1109/CVPR42600.2020.01445]
    [57] Cheng SY, Liu YQ, Ma SQ, Zhang XY. Deep feature space Trojan attack of neural networks by controlled detoxification. In: Proc. of the 35th AAAI Conf. on Artificial Intelligence. Virtually: AAAI Press, 2021. 1148–1156. [doi: 10.1609/aaai.v35i2.16201]
    [58] Hammoud HAAK, Ghanem B. Check your other door! Creating backdoor attacks in the frequency domain. In: Proc. of the 33rd British Machine Vision Conf. London: BMVA Press, 2022. 259.
    [59] Liu XR, Tan YA, Wang YJ, Qiu KF, Li YZ. Stealthy low-frequency backdoor attack against deep neural networks. arXiv:2305.09677, 2023.
    [60] Yue C, Lv PZ, Liang RG, Chen K. Invisible backdoor attacks using data poisoning in frequency domain. In: Proc. of the 26th European Conf. on Artificial Intelligence. Kraków: IOS Press, 2023. 2954–2961. [doi: 10.3233/FAIA230610]
    [61] Li XK, Chen ZR, Zhao Y, Tong ZK, Zhao YB, Lim A, Zhou JT. PointBA: Towards backdoor attacks in 3D point cloud. In: Proc. of the 2021 IEEE Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 16472–16481. [doi: 10.1109/ICCV48922.2021.01618]
    [62] Fan LK, He FZ, Si TZ, Fan RB, Ye CL, Li B. MBA: Backdoor attacks against 3D mesh classifier. IEEE Trans. on Information Forensics and Security, 2024, 19: 2127–2142.
    [63] Sasaki S, Hidano S, Uchibayashi T, Suganuma T, Hiji M, Kiyomoto S. On embedding backdoor in malware detectors using machine learning. In: Proc. of the 17th Int’l Conf. on Privacy, Security and Trust. Fredericton: IEEE, 2019. 1–5. [doi: 10.1109/PST47121.2019.8949034]
    [64] Li CR, Chen X, Wang DR, Wen S, Ahmed ME, Camtepe S, Xiang Y. Backdoor attack on machine learning based android malware detectors. IEEE Trans. on Dependable and Secure Computing, 2022, 19(5): 3357–3370.
    [65] Tian JW, Qiu KF, Gao DB, Wang Z, Kuang XH, Zhao G. Sparsity brings vulnerabilities: Exploring new metrics in backdoor attacks. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 2689–2706.
    [66] Salem A, Wen R, Backes M, Ma SQ, Zhang Y. Dynamic backdoor attacks against machine learning models. In: Proc. of the 7th IEEE European Symp. on Security and Privacy (EuroS&P). Genoa: IEEE, 2022. 703–718. [doi: 10.1109/EuroSP53844.2022.00049]
    [67] Nguyen TA, Tran TA. Input-aware dynamic backdoor attack. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 3454–3464.
    [68] Doan K, Lao YJ, Zhao WJ, Li P. LIRA: Learnable, imperceptible and robust backdoor attacks. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 11946–11956. [doi: 10.1109/ICCV48922.2021.01175]
    [69] Gong XL, Chen YJ, Wang Q, Huang HY, Meng LS, Shen C, Zhang Q. Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE Journal on Selected Areas in Communications, 2021, 39(8): 2617–2631.
    [70] Xue MF, Ni SF, Wu YH, Zhang YS, Liu WQ. Imperceptible and multi-channel backdoor attack. Applied Intelligence, 2024, 54(1): 1099–1116.
    [71] Chow KH, Wei WQ, Yu L. Imperio: Language-guided backdoor attacks for arbitrary model control. In: Proc. of the 33rd Int’l Joint Conf. on Artificial Intelligence. 2024. 704–712. [doi: 10.24963/ijcai.2024/78]
    [72] Liu YQ, Ma SQ, Aafer Y, Lee WC, Zhai J, Wang WH, Zhang XY. Trojaning attack on neural networks. In: Proc. of the 25th Annual Network and Distributed System Security Symp. San Diego: Internet Society, 2018. [doi: 10.14722/ndss.2018.23291]
    [73] Lv PZ, Ma HL, Zhou JC, Liang RG, Chen K, Zhang SZ, Yang YF. DBIA: Data-free backdoor injection attack against transformer networks. arXiv:2111.11870, 2021.
    [74] Lv PZ, Yue C, Liang RG, Yang YF, Zhang SZ, Ma HL, Chen K. A data-free backdoor injection approach in neural networks. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 2671–2688.
    [75] Yu Y, Wang YF, Yang WH, Lu SJ, Tan YP, Kot AC. Backdoor attacks against deep image compression via adaptive frequency trigger. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 12250–12259. [doi: 10.1109/CVPR52729.2023.01179]
    [76] Yang WK, Li L, Zhang ZY, Ren XC, Sun X, He B. Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in NLP models. In: Proc. of the 2021 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2021. 2048–2058. [doi: 10.18653/v1/2021.naacl-main.165]
    [77] Li LY, Song DM, Li XN, Zeng JH, Ma RT, Qiu XP. Backdoor attacks on pre-trained models by layerwise weight poisoning. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. ACL, 2021. 3023–3032. [doi: 10.18653/v1/2021.emnlp-main.241]
    [78] Tang RX, Du MN, Liu NH, Yang F, Hu X. An embarrassingly simple approach for Trojan attack in deep neural networks. In: Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. Virtual Event: ACM, 2020. 218–228. [doi: 10.1145/3394486.3403064]
    [79] Hong S, Carlini N, Kurakin A. Handcrafted backdoors in deep neural networks. In: Proc. of the 36th Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 8068–8080.
    [80] Kurita K, Michel P, Neubig G. Weight poisoning attacks on pre-trained models. arXiv:2004.06660, 2020.
    [81] Wei CA, Lee Y, Chen K, Meng GZ, Lv PZ. Aliasing backdoor attacks on pre-trained models. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 2707–2724.
    [82] Li HL, Wang YF, Xie XF, Liu Y, Wang SQ, Wan RJ, Chau LP, Kot AC. Light can hack your face! Black-box backdoor attack on face recognition systems. arXiv:2009.06996, 2020.
    [83] Rakin AS, He ZZ, Fan DL. TBT: Targeted neural network attack with bit Trojan. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 13195–13204. [doi: 10.1109/CVPR42600.2020.01321]
    [84] Chen HL, Fu C, Zhao JS, Koushanfar F. ProFlip: Targeted Trojan attack with progressive bit flips. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 7698–7707. [doi: 10.1109/ICCV48922.2021.00762]
    [85] Bagdasaryan E, Shmatikov V. Blind backdoors in deep learning models. In: Proc. of the 30th USENIX Security Symp. USENIX Association, 2021. 1505–1521.
    [86] Saha A, Tejankar A, Koohpayegani SA, Pirsiavash H. Backdoor attacks on self-supervised learning. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 13327–13336. [doi: 10.1109/CVPR52688.2022.01298]
    [87] Hou RT, Huang T, Yan HY, Ke LS, Tang WX. A stealthy and robust backdoor attack via frequency domain transform. World Wide Web, 2023, 26(5): 2767–2783.
    [88] Wang T, Yao Y, Xu F, An SW, Tong HH, Wang T. An invisible black-box backdoor attack through frequency domain. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 396–413. [doi: 10.1007/978-3-031-19778-9_23]
    [89] Xiang Z, Miller DJ, Chen SH, Li X, Kesidis G. A backdoor attack against 3D point cloud classifiers. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 7577–7587. [doi: 10.1109/ICCV48922.2021.00750]
    [90] Gao KF, Bai JW, Wu BY, Ya MX, Xia ST. Imperceptible and robust backdoor attack in 3D point cloud. IEEE Trans. on Information Forensics and Security, 2024, 19: 1267–1282.
    [91] Li SF, Liu H, Dong T, Zhao BZH, Xue MH, Zhu HJ, Lu JL. Hidden backdoors in human-centric language models. In: Proc. of the 2021 ACM SIGSAC Conf. on Computer and Communications Security. Virtual Event: ACM, 2021. 3123–3140. [doi: 10.1145/3460120.3484576]
    [92] Li ZC, Li PJ, Sheng X, Yin CC, Zhou L. IMTM: Invisible multi-trigger multimodal backdoor attack. In: Proc. of the 12th National CCF Conf. on Natural Language Processing and Chinese Computing. Foshan: Springer, 2023. 533–545. [doi: 10.1007/978-3-031-44696-2_42]
    [93] Mei K, Li Z, Wang ZT, Zhang Y, Ma SQ. NOTABLE: Transferable backdoor attacks against prompt-based NLP models. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics, Vol. 1 (Long Papers). Toronto: ACL, 2023. 15551–15565. [doi: 10.18653/v1/2023.acl-long.867]
    [94] Barni M, Kallas K, Tondi B. A new backdoor attack in CNNS by training set corruption without label poisoning. In: Proc. of the 2019 IEEE Int’l Conf. on Image Processing (ICIP). Taipei: IEEE, 2019. 101–105. [doi: 10.1109/ICIP.2019.8802997]
    [95] Zhang Q, Ding YF, Tian YQ, Guo JM, Yuan M, Jiang Y. AdvDoor: Adversarial backdoor attack of deep learning system. In: Proc. of the 30th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Virtual: ACM, 2021. 127–138. [doi: 10.1145/3460319.3464809]
    [96] Shafahi A, Huang WR, Najibi M, Suciu O, Studer C, Dumitras T, Goldstein T. Poison frogs! Targeted clean-label poisoning attacks on neural networks. In: Proc. of the 32nd Conf. on Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018. 6106–6116.
    [97] D’Onghia M, Di Cesare F, Gallo L, Carminati M, Polino M, Zanero S. Lookin’ out my backdoor! Investigating backdooring attacks against DL-driven malware detectors. In: Proc. of the 16th ACM Workshop on Artificial Intelligence and Security. Copenhagen: ACM, 2023. 209–220. [doi: 10.1145/3605764.3623919]
    [98] Li YZ, Li YM, Wu BY, Li LK, He R, Lyu SW. Invisible backdoor attack with sample-specific triggers. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 16443–16452. [doi: 10.1109/ICCV48922.2021.01615]
    [99] Ma BH, Zhao C, Wang DJ, Meng B. DIHBA: Dynamic, invisible and high attack success rate boundary backdoor attack with low poison ratio. Computers & Security, 2023, 129: 103212.
    [100] Chen B, Carvalho W, Baracaldo N, Ludwig H, Edwards B, Lee T, Molly I, Sricastava B. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv:1811.03728, 2018.
    [101] Tran B, Li J, Madry A. Spectral signatures in backdoor attacks. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montreal: Curran Associates Inc., 2018. 8011–8021.
    [102] Pan MZ, Zeng Y, Lyu LJ, Lin X, Jia RX. ASSET: Robust backdoor data detection across a multiplicity of deep learning paradigms. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 2725–2742.
    [103] Ma WL, Wang DR, Sun RX, Xue MH, Wen S, Xiang Y. The “Beatrix” Resurrections: Robust backdoor detection via gram matrices. In: Proc. of the 30th Annual Network and Distributed System Security Symp. San Diego: Internet Society, 2023. [doi: 10.14722/ndss.2023.23069]
    [104] Gao YS, Xu CG, Wang DR, Chen SP, Ranasinghe DC, Nepal S. STRIP: A defence against Trojan attacks on deep neural networks. In: Proc. of the 35th Annual Computer Security Applications Conf. San Juan: ACM, 2019. 113–125. [doi: 10.1145/3359789.3359790]
    [105] Chou E, Tramer F, Pellegrino G. SentiNet: Detecting localized universal attacks against deep learning systems. In: Proc. of the 2020 IEEE Security and Privacy Workshops. San Francisco: IEEE, 2020. 48–54. [doi: 10.1109/SPW50608.2020.00025]
    [106] Doan BG, Abbasnejad E, Ranasinghe DC. Februus: Input purification defense against Trojan attacks on deep neural network systems. In: Proc. of the 36th Annual Computer Security Applications Conf. Austin: ACM, 2020. 897–912. [doi: 10.1145/3427228.3427264]
    [107] Qi XY, Xie TH, Wang JT, Wu T, Mahloujifar S, Mittal P. Towards a proactive ML approach for detecting backdoor poison samples. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 1685–1702.
    [108] Li YG, Lyu XX, Koren N, Lyu LJ, Li B, Ma XJ. Anti-Backdoor Learning: Training clean models on poisoned data. In: Proc. of the 35th Conf. on Neural Information Processing Systems. Curran Associates Inc., 2021. 14900–14912.
    [109] Huang KZ, Li YM, Wu BY, Qin Z, Ren K. Backdoor defense via decoupling the training process. arXiv:2202.03423, 2022.
    [110] Gao KF, Bai Y, Gu JD, Yang Y, Xia ST. Backdoor defense via adaptively splitting poisoned dataset. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 4005–4014. [doi: 10.1109/CVPR52729.2023.00390]
    [111] Zhang ZX, Liu Q, Wang ZC, Lu ZP, Hu QY. Backdoor defense via deconfounded representation learning. In: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 12228–12238. [doi: 10.1109/CVPR52729.2023.01177]
    [112] Qi FC, Chen YY, Li MK, Yao Y, Liu ZY, Sun MS. ONION: A simple and effective defense against textual backdoor attacks. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. ACL, 2021. 9558–9566. [doi: 10.18653/v1/2021.emnlp-main.752]
    [113] Chen CS, Dai JZ. Mitigating backdoor attacks in LSTM-based text classification systems by backdoor keyword identification. Neurocomputing, 2021, 452: 253–262.
    [114] Yang WK, Lin YK, Li P, Zhou J, Sun X. RAP: Robustness-aware perturbations for defending against backdoor attacks on nlp models. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. ACL, 2021. 8365–8381. [doi: 10.18653/v1/2021.emnlp-main.659]
    [115] Sabir B, Babar MA, Abuadbba S. Interpretability and transparency-driven detection and transformation of textual adversarial examples (IT-DT). arXiv:2307.01223, 2023.
    [116] Pei HZ, Jia JY, Guo WB, Li B, Song D. TextGuard: Provable defense against backdoor attacks on text classification. In: Proc. of the 31st Annual Network and Distributed System Security Symp. San Diego: Internet Society, 2024. [doi: 10.14722/ndss.2024.24090]
    [117] Wang BL, Yao YS, Shan S, Li HY, Viswanath B, Zheng HT, Zhao BY. Neural Cleanse: Identifying and mitigating backdoor attacks in neural networks. In: Proc. of the 2019 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2019. 707–723. [doi: 10.1109/SP.2019.00031]
    [118] Guo WB, Wang L, Xing XY, Du M, Song D. TABOR: A highly accurate approach to inspecting and restoring Trojan backdoors in AI systems. arXiv:1908.01763, 2019.
    [119] Wang R, Zhang GY, Liu SJ, Chen PY, Xiong JJ, Wang M. Practical detection of Trojan neural networks: Data-limited and data-free cases. In: Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 222–238. [doi: 10.1007/978-3-030-58592-1_14]
    [120] Liu YQ, Lee WC, Tao GH, Ma SQ, Aafer Y, Zhang XY. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In: Proc. of the 2019 ACM SIGSAC Conf. on Computer and Communications Security. London: ACM, 2019. 1265–1282. [doi: 10.1145/3319535.3363216]
    [121] Liu K, Dolan-Gavitt B, Garg S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In: Proc. of the 21st Int’l Symp. on Research in Attacks, Intrusions, and Defenses. Heraklion: Springer, 2018. 273–294. [doi: 10.1007/978-3-030-00470-5_13]
    [122] Wu DX, Wang YS. Adversarial neuron pruning purifies backdoored deep models. In: Proc. of the 35th Int’l Conf. on Neural Information Processing Systems. Curran Associates Inc., 2021. 16913–16925.
    [123] Hong S, Chandrasekaran V, Kaya Y, Dumitra? T, Papernot N. On the effectiveness of mitigating data poisoning attacks with gradient shaping. arXiv:2002.11497, 2020.
    [124] Du M, Jia RX, Song D. Robust anomaly detection and backdoor attack detection via differential privacy. arXiv:1911.07116, 2019.
    [125] Azizi A, Tahmid IA, Waheed A, Mangaokar N, Pu JM, Javed M, Reddy CK, Viswanath B. T-Miner: A generative approach to defend against Trojan attacks on DNN-based text classification. In: Proc. of the 30th USENIX Security Symp. USENIX Association, 2021. 2255–2272.
    [126] Shen GY, Liu YQ, Tao GH, Xu QL, Zhang Z, An SW, Ma SQ, Zhang XY. Constrained optimization with dynamic bound-scaling for effective NLP backdoor defense. In: Proc. of the 39th Int’l Conf. on Machine Learning. Baltimore: ICML, 2022. 19879–19892.
    [127] Xu XJ, Wang Q, Li HC, Borisov N, Gunter CA, Li B. Detecting AI Trojans using meta neural analysis. In: Proc. of the 2021 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2021. 103–120. [doi: 10.1109/SP40001.2021.00034]
    [128] Kolouri S, Saha A, Pirsiavash H, Hoffmann H. Universal litmus patterns: Revealing backdoor attacks in CNNs. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 298–307. [doi: 10.1109/CVPR42600.2020.00038]
    [129] Wang JL, Zhang ZY, Wang MQ, Qiu H, Zhang TW, Li Q, Li ZP, Wei T, Zhang C. Aegis: Mitigating targeted bit-flip attacks against deep neural networks. In: Proc. of the 32nd USENIX Security Symp. Anaheim: USENIX Association, 2023. 2329–2346.
    [130] Xiang C, Bhagoji AN, Sehwag V, Mittal P. PatchGuard: A provably robust defense against adversarial patches via small receptive fields and masking. In: Proc. of the 30th USENIX Security Symp. USENIX Association, 2021. 2237–2254.
    [131] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11): 2278–2324.
    [132] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, TR-2009, Toronto: University of Toronto, 2009.
    [133] Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In: Proc. of the 2009 IEEE Conf. on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255. [doi: 10.1109/CVPR.2009.5206848]
    [134] Stallkamp J, Schlipsing M, Salmen J, Igel C. The german traffic sign recognition benchmark: A multi-class classification competition. In: Proc. of the 2011 Int’l Joint Conf. on Neural Networks. San Jose: IEEE, 2011. 1453–1460. [doi: 10.1109/IJCNN.2011.6033395]
    [135] Cao Q, Shen L, Xie WD, Parkhi OM, Zisserman A. VGGFace2: A dataset for recognising faces across pose and age. In: Proc. of the 13th IEEE Int’l Conf. on Automatic Face & Gesture Recognition (FG). Xi’an: IEEE, 2018. 67–74. [doi: 10.1109/FG.2018.00020]
    [136] Kumar N, Berg AC, Belhumeur PN, Nayar SK. Attribute and simile classifiers for face verification. In: Proc. of the 12th IEEE Int’l Conf. on Computer Vision. Kyoto: IEEE, 2009. 365–372. [doi: 10.1109/ICCV.2009.5459250]
    [137] Liu ZW, Luo P, Wang XG, Tang XO. Deep learning face attributes in the wild. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision (ICCV). Santiago: IEEE, 2015. 3730–3738. [doi: 10.1109/ICCV.2015.425]
    [138] Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Potland: ACL, 2011. 142–150.
    [139] Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Seattle: ACL, 2013. 1631–1642.
    [140] Yelp dataset. 2024. https://www.yelp.com/dataset
    [141] Rajpurkar P, Zhang J, Lopyrev K, Liang P. SQuAD: 100, 000+ Questions for machine comprehension of text. In: Proc. of the 2016 Conf. on Empirical Methods in Natural Language Process. Austin: ACL, 2016. 2383–2392. [doi: 10.18653/v1/D16-1264]
    [142] Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017.
    [
    相似文献
    引证文献
引用本文

高梦楠,陈伟,吴礼发,张伯雷.面向深度学习的后门攻击及防御研究综述.软件学报,2025,36(7):3271-3305

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-27
  • 最后修改日期:2024-07-15
  • 在线发布日期: 2025-04-25
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号