Self-training Approach for Low-resource Relation Extraction

doi:10.13328/j.cnki.jos.007219

微信服务号

微信订阅号

2025-4-21- 13

Home > Archive>Volume 36, Issue 4, 2025 >1620-1636. DOI:10.13328/j.cnki.jos.007219

PDF HTML XML Export Cite reminder

Self-training Approach for Low-resource Relation Extraction
DOI:
                        10.13328/j.cnki.jos.007219
                    
Author:
                        YU Jun-JieYU Jun-Jie
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG XingWANG Xing
Tencent AI Lab, Shenzhen 518000, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Wen-LiangCHEN Wen-Liang
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG MinZHANG Min
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Self-training, a common strategy for tackling the annotated-data scarcity, typically involves acquiring auto-annotated data with high confidence generated by a teacher model as reliable data. However, in low-resource scenarios for Relation Extraction (RE) tasks, this approach is hindered by the limited generalization capacity of the teacher model and the confusable relational categories in tasks. Consequently, efficiently identifying reliable data from automatically labeled data becomes challenging, and a large amount of low-confidence noise data will be generalized. Therefore, this study proposes a self-training approach for low-resource relation extraction (ST-LRE). This approach aids the teacher model in selecting reliable data based on prediction ways of paraphrases, and extracts ambiguous data with reliability from low-confidence data based on partially-labeled modes. Considering the candidate categories of ambiguous data, this study proposes a negative training approach based on the set of negative labels. Finally, a unified approach capable of both positive and negative training is proposed for the integrated training of reliable data and ambiguous data. In the experiments, ST-LRE consistently demonstrates significant improvements in low-resource scenarios of two widely used RE datasets SemEval2010 Task-8 and Re-TACRED.

Key words:natural language processing;information extraction;relation extraction;low-resource;self-training

Get Citation

郁俊杰,王星,陈文亮,张民.面向低资源关系抽取的自训练方法.软件学报,2025,36(4):1620-1636

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 10,2023
Revised:January 18,2024
Adopted:
Online: July 03,2024
Published:

You are the first2036640Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History