Few-shot Incremental Learning with Textual-knowledge Embedding by Visual-language Model

doi:10.13328/j.cnki.jos.007022

微信服务号

微信订阅号

2025-5-15- 13

Home > Archive>Volume 35, Issue 5, 2024 >2101-2119. DOI:10.13328/j.cnki.jos.007022

PDF HTML XML Export Cite reminder

Few-shot Incremental Learning with Textual-knowledge Embedding by Visual-language Model
DOI:
                        10.13328/j.cnki.jos.007022
                    
Author:
                        YAO Han-TaoYAO Han-Tao
State Key Laboratory of Multimodal Artificial Intelligence Systems(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU LuYU Lu
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XU Chang-ShengXU Chang-Sheng
State Key Laboratory of Multimodal Artificial Intelligence Systems(Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.

Key words:few-shot incremental learning (FSIL);visual-language model;textual-knowledge embedding;class-space guided anti-forgetting learning

Get Citation

姚涵涛,余璐,徐常胜.视觉语言模型引导的文本知识嵌入的小样本增量学习.软件学报,2024,35(5):2101-2119

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 06,2023
Revised:June 08,2023
Adopted:
Online: September 11,2023
Published: May 06,2024

You are the first2044637Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History