融合多模态数据的小样本命名实体识别方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62276233,62302451);浙江省自然科学基金(LQ22F020018);浙江省重点研发项目(2023C01048)


Multimodal Data Fusion for Few-shot Named Entity Recognition Method
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为自然语言处理领域的关键子任务,命名实体识别通过提取文本中的关键信息,帮助机器翻译、文本生成、知识图谱构建以及多模态数据融合等许多下游任务深度理解文本蕴含的复杂语义信息,有效地完成任务.在实际生活中,由于时间和人力等成本问题,命名实体识别任务常常受限于标注样本的稀缺.尽管基于文本的小样本命名实体识别方法已取得较好的泛化表现,但由于样本量有限,使得模型能提取的语义信息也十分受限,进而导致模型预测效果依然不佳.针对标注样本稀缺给基于文本的小样本命名实体识别方法带来的挑战,提出了一种融合多模态数据的小样本命名实体识别模型,借助多模态数据提供额外语义信息,帮助模型提升预测效果,进而可以有效提升多模态数据融合、建模效果.该方法将图像信息转化为文本信息作为辅助模态信息,有效地解决了由文本与图像蕴含语义信息粒度不一致导致的模态对齐效果不佳的问题.为了有效地考虑实体识别中的标签依赖关系,使用CRF框架并使用最先进的元学习方法分别作为发射模块和转移模块.为了缓解辅助模态中的噪声样本对模型的负面影响,提出一种基于元学习的通用去噪网络.该去噪网络在数据量十分有限的情况下,依然可以有效地评估辅助模态中不同样本的差异性以及衡量样本对模型的有益程度.最后,在真实的单模态和多模态数据集上进行了大量的实验.实验结果验证了该方法的预测F1值比基准方法至少提升了10%,并具有良好的泛化性.

    Abstract:

    As a crucial subtask in natural language processing (NLP), named entity recognition (NER) aims to extract the import information from text, which can help many downstream tasks such as machine translation, text generation, knowledge graph construction, and multi-modal data fusion to deeply understand the complex semantic information of the text and effectively complete these tasks. In practice, due to time and labor costs, NER suffers from annotated data scarcity, known as few-shot NER. Although few-shot NER methods based on text have achieved sound generalization performance, the semantic information that the model can extract is still limited due to the few samples, which leads to the poor prediction effect of the model. To this end, this study proposes a few-shot NER model based on the multi-modal dataset fusion, which provides additional semantic information with multi-modal data for the first time, to help the model prediction and can further effectively improve the effect of multimodal data fusion and modeling. This method converts image information into text information as auxiliary modality information, which effectively solves the problem of poor modality alignment caused by the inconsistent granularity of semantic information contained in text and images. In order to effectively consider the label dependencies in few-shot NER, this study uses the CRF framework and introduces the state-of-the-art meta-learning methods as the emission module and the transition module, respectively. To alleviate the negative impact of noisy samples in the auxiliary modal samples, this study proposes a general denoising network based on the idea of meta-learning. The denoising network can measure the variability of the samples and evaluate the beneficial extent of each sample to the model. Finally, this study conducts extensive experiments on real unimodal and multimodal data sets. The experimental results show the outstanding generalization performance of the proposed method, where the proposed method outperforms the state-of-the-art methods by 10 F1 scores in the 1-shot setting.

    参考文献
    相似文献
    引证文献
引用本文

张天明,张杉,刘曦,曹斌,范菁.融合多模态数据的小样本命名实体识别方法.软件学报,2024,35(3):1107-1124

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-07-15
  • 最后修改日期:2023-09-05
  • 录用日期:
  • 在线发布日期: 2023-11-08
  • 出版日期: 2024-03-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号