Abstract:Detecting out-of-distribution (OOD) samples outside the training set distribution is crucial for deploying deep neural network (DNN) classifiers in the open environment. OOD sample detection is a binary classification problem, which is to classify the input samples into the in-distribution (ID) or OOD categories. Then, the detector itself can be re-bypassed by malicious adversarial attacks. These OOD samples with malicious perturbations are called adversarial OOD samples. Building robust OOD detectors to detect adversarial OOD samples is more challenging. Existing methods usually train DNN through adversarial OOD samples within the neighborhood of auxiliary clean OOD samples to learn separable and robust representations to malicious perturbations. However, due to the distributional differences between the auxiliary OOD training set and original ID training set, training adversarial OOD samples is not effective enough to ensure the robustness of ID boundary against adversarial perturbations. Adversarial ID samples generated from within the neighborhood of (clean) ID samples are closer to the ID boundary and are also effective in improving the adversarial robustness of the ID boundary. This study proposes a semi-supervised adversarial training approach, DiTing, to build robust OOD detectors to detect clean and adversarial OOD samples. This approach treats the adversarial ID samples as auxiliary “near OOD” samples and trains them jointly with other auxiliary clean and adversarial OOD samples to improve the robustness of OOD detection. Experiments show that DiTing has a significant advantage in detecting adversarial OOD samples generated by strong attacks while maintaining state-of-the-art performance in classifying clean ID samples and detecting clean OOD samples.