检测训练集分布之外的分布外(out-of-distribution, OOD)样本对于深度神经网络(deep neural network, DNN)分类器在开放环境的部署至关重要. 检测OOD样本可以视为一种二分类问题, 即把输入样本分类为“分布内(in-distribution, ID)”类或“分布外”类. 进一步地, 检测器自身还可能遭受到恶意的对抗攻击而被再次绕过. 这些带有恶意扰动的OOD样本称为对抗OOD样本. 构建鲁棒的OOD检测器以检测对抗OOD样本是一项更具挑战性的任务. 为习得可分离且对恶意扰动鲁棒的表示, 现有方法往往利用辅助的干净OOD样本邻域内的对抗OOD样本来训练DNN. 然而, 由于辅助的OOD训练集与原ID训练集的分布差异, 训练对抗OOD样本无法足够有效地使分布内决策边界对对抗扰动真正鲁棒. 从ID样本的邻域内生成的对抗ID样本拥有与原ID样本近乎一样的语义信息, 是一种离分布内区域更近的OOD样本, 对提升分布内边界对对抗扰动的鲁棒性很有效. 基于此, 提出一种半监督的对抗训练方法——谛听, 来构建鲁棒的OOD检测器, 用以同时检测干净OOD样本和对抗OOD样本. 谛听将对抗ID样本视为一种辅助的“近”-OOD样本, 并将其与其他辅助的干净OOD样本和对抗OOD样本联合训练DNN, 以提升OOD检测的鲁棒性. 实验结果表明, 谛听在检测由强攻击生成的对抗OOD样本上具有显著的优势, 同时在原分类主任务及检测干净OOD样本上保持先进的性能. 开源地址: https://gitee.com/zhiyang3344/diting.
Detecting out-of-distribution (OOD) samples outside the training set distribution is crucial for deploying deep neural network (DNN) classifiers in the open environment. OOD sample detection is a binary classification problem, which is to classify the input samples into the in-distribution (ID) or OOD categories. Then, the detector itself can be re-bypassed by malicious adversarial attacks. These OOD samples with malicious perturbations are called adversarial OOD samples. Building robust OOD detectors to detect adversarial OOD samples is more challenging. Existing methods usually train DNN through adversarial OOD samples within the neighborhood of auxiliary clean OOD samples to learn separable and robust representations to malicious perturbations. However, due to the distributional differences between the auxiliary OOD training set and original ID training set, training adversarial OOD samples is not effective enough to ensure the robustness of ID boundary against adversarial perturbations. Adversarial ID samples generated from within the neighborhood of (clean) ID samples are closer to the ID boundary and are also effective in improving the adversarial robustness of the ID boundary. This study proposes a semi-supervised adversarial training approach, DiTing, to build robust OOD detectors to detect clean and adversarial OOD samples. This approach treats the adversarial ID samples as auxiliary near-OOD samples and trains them jointly with other auxiliary clean and adversarial OOD samples to improve the robustness of OOD detection. Experiments show that DiTing has a significant advantage in detecting adversarial OOD samples generated by strong attacks while maintaining state-of-the-art performance in classifying clean ID samples and detecting clean OOD samples. Code is available at https://gitee.com/zhiyang3344/diting.