Abstract:Detecting Out-Of-Distribution (OOD) samples outside the training set distribution is crucial for deploying Deep Neural Network (DNN) classifiers in the open world. The detection of OOD samples is a binary classification problem, i.e., classifying the input samples into the "In-Distribution (ID)" or "Out-Of-Distribution" class. Further, the detector itself can be bypassed again by malicious adversarial attacks, and the OOD samples with malicious perturbations often refer to adversarial OOD samples. Building robust OOD detectors to detect adversarial OOD samples is a more challenging task. To learn more separable and robust representations against adversarial perturbations, existing methods usually train DNNs using adversarial OOD samples within the neighborhood of auxiliary clean OOD samples; however, due to the distributional differences between clean OOD samples and clean ID samples, training adversarial OOD samples is not effective to ensure the robustness of in-distribution decision boundary against adversarial perturbations. Adversarial ID samples generated from within the neighborhood of (clean) ID samples are closer to the in-distribution region and are effective in improving the adversarial robustness of the in-distribution decision boundary. In this paper, we propose a semi-supervised adversarial training approach, DiTing, to build robust OOD detectors to detect clean and adversarial OOD samples. DiTing treats the auxiliary adversarial ID samples as OOD samples and trains them jointly with other auxiliary clean and adversarial OOD samples to improve the robustness of OOD detectors. Experiments show that DiTing has a significant advantage in detecting adversarial OOD samples generated by strong attacks while maintaining state-of-the-art performance in classifying clean ID samples and detecting clean OOD samples. Code is available at: https://gitee.com/zhiyang3344/diting