Pobe: Generative Model-based Out-of-distribution Text Detection Method

doi:10.13328/j.cnki.jos.006956

微信服务号

微信订阅号

Home > Archive>Volume 35, Issue 9, 2024 >4365-4376. DOI:10.13328/j.cnki.jos.006956

PDF HTML XML Export Cite reminder

Pobe: Generative Model-based Out-of-distribution Text Detection Method
DOI:
                        10.13328/j.cnki.jos.006956
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

It is essential to detect out-of-distribution (OOD) training set samples for a safe and reliable machine learning system. Likelihood-based generative models are popular methods to detect OOD samples because they do not require sample labels during training. However, recent studies show that likelihoods sometimes fail to detect OOD samples, and the failure reason and solutions are under explored, especially for text data. Therefore, this study investigates the text failure reason from the views of the model and data: insufficient generalization of the generative model and prior probability bias of the text. To tackle the above problems, the study proposes a new OOD text detection method, namely Pobe. To address insufficient generalization of the generative model, the study increases the model generalization via KNN retrieval. Next, to address the prior probability bias of the text, the study designs a strategy to calibrate the bias and improve the influence of probability bias on OOD detection by a pre-trained language model and demonstrates the effectiveness of the strategy according to Bayes’ theorem. Experimental results over a wide range of datasets show the effectiveness of the proposed method. Specifically, the average AUROC is over 99%, and FPR95 is below 1% under eight datasets.

Reference

Cited by

Get Citation

欧阳亚文,高源,宗石,鲍宇,戴新宇. Pobe: 一种基于生成式模型的分布外文本检测方法.软件学报,2024,35(9):4365-4376

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 02,2022
Revised:September 20,2022
Adopted:
Online: September 20,2023
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History