[关键词]
[摘要]
真实场景往往面临数据稀缺和数据动态变化的问题, 小样本增量学习的目的是利用少量数据推理数据知识并减缓模型对于旧知识的灾难性遗忘. 已有的小样本增量学习的算法(CEC和FACT等)主要是利用视觉特征来调整特征编码器或者分类器, 实现模型对于新数据的迁移和旧数据的抗遗忘. 但是少量数据的视觉特征往往难以建模一个类别的完整特征分布, 导致上述算法的泛化能力较弱. 相比于视觉特征, 图像类别描述的文本特征具有较好的泛化性和抗遗忘性. 因此, 在视觉语言模型的基础上, 研究基于文本知识嵌入的小样本增量学习, 通过在视觉特征中嵌入具有抗遗忘能力的文本特征, 实现小样本增量学习中新旧类别数据的有效学习. 具体而言, 在基础学习阶段, 利用视觉语言模型抽取图像的预训练视觉特征和类别的文本描述, 并通过文本编码器实现预训练视觉特征到文本空间的映射. 进一步利用视觉编码器融合学习到的文本特征和预训练视觉特征抽象具有高辨别能力的视觉特征. 在增量学习阶段, 提出类别空间引导的抗遗忘学习, 利用旧数据的类别空间编码和新数据特征微调视觉编码器和文本编码器, 实现新数据知识学习的同时复习旧知识. 在4个数据集(CIFAR-100, CUB-200, Car-196和 miniImageNet)上验证算法的有效性, 证明基于视觉语言模型文本知识嵌入可以在视觉特征的基础上进一步提升小样本增量学习的鲁棒性.
[Key word]
[Abstract]
In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.
[中图分类号]
[基金项目]
科技创新2030—“新一代人工智能”重大项目(2021ZD0112202);北京市自然科学基金(L201001,4222039);国家自然科学基金(U21B2044,62202331,62376268)