Abstract:Pretrained code models have demonstrated strong capabilities in code understanding and analysis and have become important tools and major research focuses in source code vulnerability detection. However, similar to traditional deep learning models, pretrained code models exhibit robustness vulnerabilities when exposed to carefully crafted adversarial code. Attackers can mislead the model into classifying vulnerable code as non-vulnerable by introducing semantically preserving perturbations, thus posing a significant threat to software security. Therefore, adversarial attacks for pretrained vulnerability detection models not only serve as an effective approach to evaluate the robustness of pretrained code models but also provide critical insights for the development of future vulnerability detection models and defense mechanisms. In the hard-label black-box attack scenario, this study proposes VulBlurrer, a black-box adversarial attack method targeting pretrained vulnerability detection models. VulBlurrer designs a directed synonymous code transformation strategy, prioritizing perturbations on highly sensitive regions adjacent to vulnerabilities and on specific statements. It also introduces an escape score based on feature consistency, semantic consistency, and code fluency, enabling the quantification of the potential attack value of candidate samples without accessing internal information of the target model. Furthermore, this study adopts a genetic-algorithm-based optimization strategy for adversarial code, in which the weights used to compute the escape score are dynamically adjusted and an elite retention mechanism is applied during the iterative process, thus further improving attack accuracy. VulBlurrer and baseline methods are evaluated on pretrained vulnerability detection models based on CodeBERT, GraphCodeBERT, CodeT5, and UniXcoder. The attack success rates of VulBlurrer on the four target models reach 85.51%, 91.47%, 93.14%, and 71.61%, with average query numbers of 12.67, 9.10, 11.07, and 19.44, respectively. Compared with existing methods, VulBlurrer achieves higher attack success rates and a better trade-off between attack success rate and query efficiency, while the generated adversarial code also exhibit superior consistency and fluency. Furthermore, empirical studies are conducted on ChatGPT, DeepSeek, and LLM-based programming assistants, including GitHub Copilot and TRAE, verifying the effectiveness of the proposed method against large language models. These results indicate that pretrained code models still face robustness challenges posed by adversarial attacks in vulnerability detection tasks, and vulnerability detection tools based on pretrained models and large language models require further improvements to enhance robustness against adversarial code.