视觉语言模型引导的文本知识嵌入的小样本增量学习

doi:10.13328/j.cnki.jos.007022

微信服务号

微信订阅号

2025年5月1日 5:42 星期四

首页 > 过刊浏览>2024年第35卷第5期 >2101-2119. DOI:10.13328/j.cnki.jos.007022

PDF HTML阅读 XML下载导出引用引用提醒

视觉语言模型引导的文本知识嵌入的小样本增量学习
DOI:
                        10.13328/j.cnki.jos.007022
                    
CSTR:
                        
                    
作者:
                        姚涵涛姚涵涛
多模态人工智能系统全国重点实验室(中国科学院 自动化研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
余璐余璐
天津理工大学 计算机科学与工程学院, 天津 300384
在期刊界中查找
在百度中查找
在本站中查找
徐常胜徐常胜
多模态人工智能系统全国重点实验室(中国科学院 自动化研究所), 北京 100190;中国科学院大学 人工智能学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:姚涵涛,E-mail:hantao.yao@nlpr.ia.ac.cn
中图分类号:
基金项目:科技创新2030—“新一代人工智能”重大项目(2021ZD0112202);北京市自然科学基金(L201001,4222039);国家自然科学基金(U21B2044,62202331,62376268)

Few-shot Incremental Learning with Textual-knowledge Embedding by Visual-language Model

Author:

YAO Han-Tao
YAO Han-Tao
State Key Laboratory of Multimodal Artificial Intelligence Systems(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
YU Lu
YU Lu
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
在期刊界中查找
在百度中查找
在本站中查找
XU Chang-Sheng
XU Chang-Sheng
State Key Laboratory of Multimodal Artificial Intelligence Systems(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

真实场景往往面临数据稀缺和数据动态变化的问题, 小样本增量学习的目的是利用少量数据推理数据知识并减缓模型对于旧知识的灾难性遗忘. 已有的小样本增量学习的算法(CEC和FACT等)主要是利用视觉特征来调整特征编码器或者分类器, 实现模型对于新数据的迁移和旧数据的抗遗忘. 但是少量数据的视觉特征往往难以建模一个类别的完整特征分布, 导致上述算法的泛化能力较弱. 相比于视觉特征, 图像类别描述的文本特征具有较好的泛化性和抗遗忘性. 因此, 在视觉语言模型的基础上, 研究基于文本知识嵌入的小样本增量学习, 通过在视觉特征中嵌入具有抗遗忘能力的文本特征, 实现小样本增量学习中新旧类别数据的有效学习. 具体而言, 在基础学习阶段, 利用视觉语言模型抽取图像的预训练视觉特征和类别的文本描述, 并通过文本编码器实现预训练视觉特征到文本空间的映射. 进一步利用视觉编码器融合学习到的文本特征和预训练视觉特征抽象具有高辨别能力的视觉特征. 在增量学习阶段, 提出类别空间引导的抗遗忘学习, 利用旧数据的类别空间编码和新数据特征微调视觉编码器和文本编码器, 实现新数据知识学习的同时复习旧知识. 在4个数据集(CIFAR-100, CUB-200, Car-196和 miniImageNet)上验证算法的有效性, 证明基于视觉语言模型文本知识嵌入可以在视觉特征的基础上进一步提升小样本增量学习的鲁棒性.

关键词:小样本增量学习;视觉语言模型;文本知识嵌入;类别空间引导的抗遗忘学习

Abstract:

In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.

Key words:few-shot incremental learning (FSIL);visual-language model;textual-knowledge embedding;class-space guided anti-forgetting learning

引用本文

姚涵涛,余璐,徐常胜.视觉语言模型引导的文本知识嵌入的小样本增量学习.软件学报,2024,35(5):2101-2119

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-04-06
最后修改日期:2023-06-08
录用日期:
在线发布日期: 2023-09-11
出版日期: 2024-05-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码