Robustness Evaluation of ChatGPT Against Chinese Adversarial Attacks

doi:10.13328/j.cnki.jos.007299

微信服务号

微信订阅号

2025-4-21- 16

Home > Archive>Volume , Issue , >1-25. DOI:10.13328/j.cnki.jos.007299

PDF HTML XML Export Cite reminder

Robustness Evaluation of ChatGPT Against Chinese Adversarial Attacks
DOI:
                        10.13328/j.cnki.jos.007299
                    
Author:
                        ZHANG Yun-TingZHANG Yun-Ting
School of Cyberspace Science, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YE LinYE Lin
School of Cyberspace Science, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Bai-SongLI Bai-Song
Antiy Labs, Harbin 150023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Hong-LiZHANG Hong-Li
School of Cyberspace Science, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Large language model (LLM) like ChatGPT has found widespread applications across various fields due to their strong natural language understanding and generation capabilities. However, deep learning models exhibit vulnerability when subjected to adversarial example attacks. In natural language processing, current research on adversarial example generation methods typically employs CNN-based models, RNN-based models, and Transformer-based pre-trained models as target models, with few studies exploring the robustness of LLMs under adversarial attacks and quantifying the evaluation criteria of LLM robustness. Taking ChatGPT against Chinese adversarial attacks as an example, this study introduces a novel concept termed offset average difference (OAD) and proposes a quantifiable LLM robustness evaluation metric based on OAD, named OAD-based robustness score (ORS). In a black-box attack scenario, this study selects nine mainstream Chinese adversarial attack methods based on word importance to generate adversarial texts, which are then employed to attack ChatGPT and yield the attack success rate of each method. The proposed ORS assigns a robustness score to LLMs for each attack method based on the attack success rate. In addition to the ChatGPT that outputs hard labels, this study designs ORS for target models with soft-labeled outputs based on the attack success rate and the proportion of misclassified adversarial texts with high confidence. Meanwhile, this study extends the scoring formula to the fluency assessment of adversarial texts, proposing an OAD-based adversarial text fluency scoring method, named OAD-based fluency score (OFS). Compared to traditional methods requiring human involvement, the proposed OFS greatly reduces evaluation costs. Experiments conducted on real-world Chinese news and sentiment classification datasets to some extent initially demonstrate that, for text classification tasks, the robustness score of ChatGPT against adversarial attacks is nearly 20% higher than that of Chinese BERT. However, the powerful ChatGPT still produces erroneous predictions under adversarial attacks, with the highest attack success rate exceeding 40%.

Key words:deep neural network (DNN);adversarial example (AE);large language model (LLM);ChatGPT;robustness

Get Citation

张云婷,叶麟,李柏松,张宏莉.中文对抗攻击下的ChatGPT鲁棒性评估.软件学报,,():1-25

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:March 29,2024
Revised:June 18,2024
Adopted:
Online: February 26,2025
Published:

You are the first2036689Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History