Abstract:Adversarial texts are malicious samples that can cause deep learning classifiers to make errors. The adversary creates an adversarial text that can deceive the target model by adding subtle perturbations to the original text that are imperceptible to humans. The study of adversarial text generation methods can evaluate the robustness of deep neural networks and contribute to the subsequent robustness improvement of the model. Among the current adversarial text generation methods designed for Chinese text, few attack the robust Chinese BERT model as the target model. For Chinese text classification tasks, this study proposes an attack method against Chinese BERT, that is Chinese BERT Tricker. This method adopts a character-level word importance scoring method, important Chinese character positioning. Meanwhile, a word-level perturbation method for Chinese based on the masked language model with two types of strategies is designed to achieve the replacement of important words. Experimental results show that for the text classification tasks, the proposed method can significantly reduce the classification accuracy of the Chinese BERT model to less than 40% on two real datasets, and it outperforms other baseline methods in terms of multiple attack performance.