Abstract:Deep learning models are integral components of artificial intelligence systems, widely deployed in various critical real-world scenarios. Research has shown that the low transparency and weak interpretability of deep learning models render them highly sensitive to perturbations. Consequently, artificial intelligence systems are exposed to multiple security threats, with backdoor attacks on deep learning models representing a significant concern. This study provides a comprehensive overview of the research progress on backdoor attacks and defenses in mainstream deep learning systems, including computer vision and natural language processing. Backdoor attacks are categorized based on the attacker’s capabilities into full-process controllable backdoors, model modification backdoors, and data poisoning backdoors, which are further classified according to the backdoor construction methods. Defense strategies are divided into input-based defenses and model-based defenses, depending on the target of the defensive measures. This study also summarizes commonly used datasets and evaluation metrics in this domain. Lastly, existing challenges in backdoor attack and defense research are discussed, alongside recommendations and future directions focusing on security application scenarios of backdoor attacks and the efficacy of defense mechanisms.