Abstract:Deep learning has been used in the field of malware detection and achieved great results. However, recent research shows that deep learning models are not safe, and they are vulnerable to adversarial attacks. Attackers can make malware detectors give wrong output by making a few modifications to the malware without changing the original function, resulting in the omission of malware. To defend adversarial examples, the most commonly used method in previous work is adversarial training. Adversarial training requires generating a large number of adversarial examples to retrain the model, which is inefficient. Besides, the defense effect is limited by the adversarial example generation method used in training. As such, a new method is proposed to detect adversarial malware in PE format, aiming at the type of adversarial attacks that add modification to the function independent area of PE file. By using model interpretation techniques, the decision-making basis of the end-to-end malware detection model can be analyzed and the features of adversarial examples are extracted. Anomaly detection techniques are further used to identify adversarial examples. As an add-on module of the malware detection model, the proposed method does not require modifying the original model and does not need to retrain the model. Compared with other defense methods such as adversarial training, this method is more efficient and has better generalization ability which means it can defend against a variety of adversarial attack methods The proposed method is evaluated on a real-world dataset of malware. Promising results show that the method can effectively defend the adversarial attacks against the end-to-end PE format malware detection model.