视觉语言模型引导的文本知识嵌入的小样本增量学习
作者:
通讯作者:

姚涵涛,E-mail:hantao.yao@nlpr.ia.ac.cn

基金项目:

科技创新2030—“新一代人工智能”重大项目(2021ZD0112202);北京市自然科学基金(L201001,4222039);国家自然科学基金(U21B2044,62202331,62376268)


Few-shot Incremental Learning with Textual-knowledge Embedding by Visual-language Model
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [68]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    真实场景往往面临数据稀缺和数据动态变化的问题, 小样本增量学习的目的是利用少量数据推理数据知识并减缓模型对于旧知识的灾难性遗忘. 已有的小样本增量学习的算法(CEC和FACT等)主要是利用视觉特征来调整特征编码器或者分类器, 实现模型对于新数据的迁移和旧数据的抗遗忘. 但是少量数据的视觉特征往往难以建模一个类别的完整特征分布, 导致上述算法的泛化能力较弱. 相比于视觉特征, 图像类别描述的文本特征具有较好的泛化性和抗遗忘性. 因此, 在视觉语言模型的基础上, 研究基于文本知识嵌入的小样本增量学习, 通过在视觉特征中嵌入具有抗遗忘能力的文本特征, 实现小样本增量学习中新旧类别数据的有效学习. 具体而言, 在基础学习阶段, 利用视觉语言模型抽取图像的预训练视觉特征和类别的文本描述, 并通过文本编码器实现预训练视觉特征到文本空间的映射. 进一步利用视觉编码器融合学习到的文本特征和预训练视觉特征抽象具有高辨别能力的视觉特征. 在增量学习阶段, 提出类别空间引导的抗遗忘学习, 利用旧数据的类别空间编码和新数据特征微调视觉编码器和文本编码器, 实现新数据知识学习的同时复习旧知识. 在4个数据集(CIFAR-100, CUB-200, Car-196和 miniImageNet)上验证算法的有效性, 证明基于视觉语言模型文本知识嵌入可以在视觉特征的基础上进一步提升小样本增量学习的鲁棒性.

    Abstract:

    In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.

    参考文献
    [1] He KM, Zhang XY, Ren QS, Sun J. Deep residual learning for image recognition. In:Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas:IEEE. 2016. 770-778.
    [2] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In:Proc. of the 25th Int'l Conf. on Neural Information Processing Systems. Lake Tahoe:Curran Associates Inc., 2012. 1097-1105.
    [3] 刘颖, 雷研博, 范九伦, 王富平, 公衍超, 田奇. 基于小样本学习的图像分类技术综述. 自动化学报, 2021, 47(2):297-315.
    Liu Y, Lei YB, Fan JL, Wang FP, Gong YC, Tian Q. Survey on image classification technology based on small sample learning. Acta Automatica Sinica, 2021, 47(2):297-315 (in Chinese with English abstract).
    [4] 杜彦东, 冯林, 陶鹏, 龚勋, 王俊. 元迁移学习在少样本跨域图像分类中的研究. 中国图象图形学报, 2023, 28(9):2899-2912.
    Du YD, Feng L, Tao P, Gong X, Wang J. Research on meta-transfer learning in cross-domain image classification with few-shot. Journal of Image and Graphics, 2023, 28(9):2899-2912 (in Chinese with English abstract).
    [5] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In:Proc. of the 34th Int'l Conf. on Machine Learning. Sydney:PMLR, 2017. 1126-1135.
    [6] Jamal MA, Qi GJ. Task agnostic meta-learning for few-shot learning. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach:IEEE, 2019. 11711-11719.
    [7] 葛轶洲, 刘恒, 王言, 徐百乐, 周青, 申富饶. 小样本困境下的深度学习图像识别综述. 软件学报, 2022, 33(1):193-210. http://www.jos.org.cn/1000-9825/6342.htm
    Ge YZ, Liu H, Wang Y, Xu BL, Zhou Q, Shen FR. Survey on deep learning image recognition in dilemma of small samples. Ruan Jian Xue Bao/Journal of Software, 2022, 33(1):193-210 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6342.htm
    [8] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R. Overcoming catastrophic forgetting in neural networks. Proc. of the National Academy of Sciences of the United States of America, 2017, 114(13):3521-3526.
    [9] Lee SW, Kim JH, Jun J, Ha JW, Zhang BT. Overcoming catastrophic forgetting by incremental moment matching. In:Proc. of the 31st Int'l Conf. on Neural Information Processing Systems. Long Beach:Curran Associates Inc., 2017. 4655-4665.
    [10] Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T. Memory aware synapses:Learning what (not) to forget. In:Proc. of the 15th European Conf. on Computer Vision. Munich:Springer, 2018. 144-161.
    [11] 朱飞, 张煦尧, 刘成林. 类别增量学习研究进展和性能评价. 自动化学报, 2023, 49(3):635-660.
    Zhu F, Zhang XY, Liu CL. Class incremental learning:A review and performance evaluation. Acta Automatica Sinica, 2023, 49(3):635-660 (in Chinese with English abstract).
    [12] Zhao HB, Fu YJ, Li XW, Li SY, Omar B, Li X. Few-shot class-incremental learning via feature space composition. arXiv:2006.15524, 2020.
    [13] Tao XY, Hong XP, Chang XY, Dong SL, Wei X, Gong YH. Few-shot class-incremental learning. In:Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle:IEEE, 2020. 12180-12189.
    [14] Hersche M, Karunaratne G, Cherubini G, Benini L, Sebastian A, Rahimi A. Constrained few-shot class-incremental learning. In:Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans:IEEE. 2022. 9047-9057.
    [15] Zhou DW, Wang FY, Ye HJ, Ma L, Pu SL, Zhan DC. Forward compatible few-shot class-incremental learning. In:Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans:IEEE. 2022. 9036-9046.
    [16] Zhang C, Song N, Lin GS, Zheng Y, Pan P, Xu YH. Few-shot incremental learning with continually evolved classifiers. In:Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville:IEEE, 2021. 12450-12459.
    [17] 张浩宇, 王天保, 李孟择, 赵洲, 浦世亮, 吴飞. 视觉语言多模态预训练综述. 中国图象图形学报, 2022, 27(9):2652-2682.
    Zhang HY, Wang TB, Li MZ, Zhao Z, Pu SL, Wu F. Comprehensive review of visual-language-oriented multimodal pre-training methods. Journal of Image and Graphics, 2022, 27(9):2652-2682 (in Chinese with English abstract).
    [18] 殷炯, 张哲东, 高宇涵, 杨智文, 李亮, 肖芒, 孙垚棋, 颜成钢. 视觉语言预训练综述. 软件学报, 2023, 34(5):2000-2023. http://www.jos.org.cn/1000-9825/6774.htm
    Yin J, Zhang ZD, Gao YH, Yang ZW, Li L, Xiao M, Sun YQ, Yan CG. Survey on vision-language pre-training. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5):2000-2023 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6774.htm
    [19] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述. 软件学报, 2021, 32(2):327-348. http://www.jos.org.cn/1000-9825/6125.htm
    Du PF, Li XY, Gao YL. Survey on multimodal visual language representation learning. Ruan Jian Xue Bao/Journal of Software, 2021, 32(2):327-348 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6125.htm
    [20] Liu YY, Schiele B, Sun QR. An ensemble of epoch-wise empirical Bayes for few-shot learning. In:Proc. of the 16th European Conf. on Computer Vision. Glasgow:Springer, 2020. 404-421.
    [21] Park E, Oliva JB. Meta-curvature. In:Proc. of the 33rd Int'l Conf. on Neural Information Processing Systems. Vancouver:Curran Associates Inc., 2019. 298.
    [22] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In:Proc. of the 5th Int'l Conf. on Learning Representations. Toulon:OpenReview.net, 2017.
    [23] Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In:Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018. 4367-4375.
    [24] Hou RB, Chang H, Ma BP, Shan SG, Chen XL. Cross attention network for few-shot classification. In:Proc. of the 33rd Int'l Conf. on Neural Information Processing Systems. Vancouver:Curran Associates Inc., 2019. 360.
    [25] Zhang C, Cai YJ, Lin GS, Shen CH. DeepEMD:Few-shot image classification with differentiable earth mover's distance and structured classifiers. In:Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle:IEEE, 2020. 12200-12210.
    [26] Wang YX, Girshick R, Hebert M, Hariharan B. Low-shot learning from imaginary data. In:Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018. 7278-7286.
    [27] Satorras GV, Estrach BJ. Few-shot learning with graph neural networks. In:Proc. of the 6th Int'l Conf. on Learning Representations. Vancouver:OpenReview.net, 2018.
    [28] Kim J, Kim T, Kim S, Too CD. Edge-labeling graph neural network for few-shot learning. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach:IEEE, 2019. 11-20.
    [29] Gidaris S, Komodakis N. Generating classification weights with GNN denoising autoencoders for few-shot learning. In:Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach:IEEE, 2019. 21-30.
    [30] Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. iCaRL:Incremental classifier and representation learning. In:Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu:IEEE, 2017. 5533-5542.
    [31] Shin H, Lee JK, Kim J, Kim J. Continual learning with deep generative replay. In:Proc. of the 31st Int'l Conf. on Neural Information Processing Systems. Long Beach:Curran Associates Inc., 2017. 2994-3003.
    [32] Wu CS, Herranz L, Liu XL, Wang YX, van de Weijer J, Raducanu B. Memory replay GANs:Learning to generate images from new categories without forgetting. In:Proc. of the 32nd Int'l Conf. on Neural Information Processing Systems. Montréal:Curran Associates Inc., 2018. 5966-5976.
    [33] Kamra N, Gupta U, Liu Y. Deep generative dual memory network for continual learning. arXiv:1710.10368, 2017.
    [34] Liu XL, Masana M, Herranz L, Van de Weijer J, López AM, Bagdanov AD. Rotate your networks:Better weight consolidation and less catastrophic forgetting. In:Proc. of the 24th Int'l Conf. on Pattern Recognition (ICPR). Beijing:IEEE, 2018. 2262-2268.
    [35] Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R. Progressive neural networks. arXiv:1606.04671, 2022.
    [36] Yoon J, Yang E, Lee J, Hwang SJ. Lifelong learning with dynamically expandable networks. In:Proc. of the 6th Int'l Conf. on Learning Representations. Vancouver:OpenReview.net, 2018.
    [37] Rajasegaran J, Hayat M, Khan SH, Khan FS, Shao L. Random path selection for continual learning. In:Proc. of the 33rd Int'l Conf. on Neural Information Processing Systems. Vancouver:NeurIPS, 2019. 12648-12658.
    [38] Zeng GX, Chen Y, Cui B, Yu S. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 2019, 1(8):364-372.
    [39] He X, Jaeger H. Overcoming catastrophic interference using conceptor-aided backpropagation. In:Proc. of the 6th Int'l Conf. on Learning Representations. Vancouver:OpenReview.net, 2018.
    [40] Farajtabar M, Azizan N, Mott A, Li A. Orthogonal gradient descent for continual learning. In:Proc. of the 23rd Int'l Conf. on Artificial Intelligence and Statistics. Palermo:PMLR, 2020. 3762-3773.
    [41] Ren MY, Liao RJ, Fetaya E, Zemel RS. Incremental few-shot learning with attention attractor networks. In:Proc. of the 33rd Int'l Conf. on Neural Information Processing Systems. Vancouver:Curran Associates Inc., 2019. 5275-5285.
    [42] Ayub A, Wagner AR. Cognitively-inspired model for incremental learning using a few examples. In:Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops. Seattle:IEEE. 2020. 897906.
    [43] Yang BY, Lin MB, Zhang YX, Liu BH, Liang XD, Ji RR, Ye QX. Dynamic support network for few-shot class incremental learning. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2022, 45(3):2945-2951.
    [44] Zhu K, Cao Y, Zhai W, Cheng J, Zha ZJ. Self-promoted prototype refinement for few-shot class-incremental learning. In:Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville:IEEE, 2021. 6797-6806.
    [45] Akyürek AF, Akyürek E, Wijaya DT, Andreas J. Subspace regularizers for few-shot class incremental learning. In:Proc. of the 10th Int'l Conf. on Learning Representations. OpenReview.net, 2022.
    [46] Tian SS, Li LS, Li WJ, Ran H, Ning X, Tiwari P. A survey on few-shot class-incremental learning. arXiv:2304.08130, 2023.
    [47] Chi ZX, Gu L, Liu H, Wang Y, Yu YH, Tang J. MetaFSCIL:A meta-learning approach for few-shot class incremental learning. In:Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans:IEEE, 2022. 14166-14175.
    [48] Zou YX, Zhang SH, Li YH, Li RX. Margin-based few-shot class-incremental learning with class-level overfitting mitigation. In:Proc. of the 36th Int'l Conf. on Neural Information Processing Systems. New Orleans:NeurIPS, 2022. 27267-27279.
    [49] Yang YB, Yuan HB, Li XT, Lin ZC, Torr PHS, Tao DC. Neural collapse inspired feature-classifier alignment for few-shot class-incremental learning. In:Proc. of the 11th Int'l Conf. on Learning Representations. Kigali:OpenReview.net. 2023.
    [50] Zhou DW, Ye HJ, Ma L, Xie D, Pu SL, Zhan DC. Few-shot class-incremental learning by sampling multi-phase tasks. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023, 45(11):12816-12831.
    [51] Liu H, Gu L, Chi ZX, Wang Y, Yu YH, Chen J, Tang J. Few-shot class-incremental learning via entropy-regularized data-free replay. In:Proc. of the 17th European Conf. on Computer Vision. Tel Aviv:Springer, 2022. 146-162.
    [52] Cheraghian A, Rahman S, Fang PF, Roy SK, Petersson L, Harandi M. Semantic-aware knowledge distillation for few-shot class-incremental learning. In:Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville:IEEE, 2021. 2534-2543.
    [53] Dong SL, Hong XP, Tao XY, Chang XY, Wei X, Gong YH. Few-shot class-incremental learning via relation knowledge distillation. In:Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI, 2021. 1255-1263.
    [54] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. In:Proc. of the 38th Int'l Conf. on Machine Learning. PMLR, 2021. 8748-8763.
    [55] Alayrac JB, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M, Ring R, Rutherford E, Cabi S, Han TD, Gong ZT, Samangooei S, Monteiro M, Menick JL, Borgeaud S, Brock A, Nematzadeh A, Sharifzadeh S, Binkowski M, Barreira R, Vinyals O, Zisserman A, Simonyan K. Flamingo:A visual language model for few-shot learning. In:Proc. of the 36th Int'l Conf. on Neural Information Processing Systems. New Orleans:NeurIPS, 2022. 23716-23736.
    [56] Jia C, Yang YF, Xia Y, Chen YT, Parekh Z, Pham H, Le QV, Sung YH, Li Z, Duerig T. Scaling up visual and vision-language representation learning with noisy text supervision. In:Proc. of the 38th Int'l Conf. on Machine Learning. PMLR, 2021. 4904-4916.
    [57] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, Toronto:University of Toronto, 2009.
    [58] Wah C, Branson S, Welinder P, Perona P, Belongie S. The caltech-UCSD birds-200-2011 dataset. Technical Report, Pasadena:California Institute of Technology, 2011.
    [59] Krause J, Stark M, Deng J, Fei-Fei L. 3D object representations for fine-grained categorization. In:Proc. of the 2013 IEEE Int'l Conf. on Computer Vision Workshops. Sydney:IEEE, 2013. 554-561.
    [60] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words:Transformers for image recognition at scale. In:Proc. of the 9th Int'l Conf. on Learning Representations. OpenReview.net, 2021.
    [61] Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K. End-to-end incremental learning. In:Proc. of the 15th European Conf. on Computer Vision. Munich:Springer. 2018. 241-257.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

姚涵涛,余璐,徐常胜.视觉语言模型引导的文本知识嵌入的小样本增量学习.软件学报,2024,35(5):2101-2119

复制
分享
文章指标
  • 点击次数:1477
  • 下载次数: 4376
  • HTML阅读次数: 1329
  • 引用次数: 0
历史
  • 收稿日期:2023-04-06
  • 最后修改日期:2023-06-08
  • 在线发布日期: 2023-09-11
  • 出版日期: 2024-05-06
文章二维码
您是第19727216位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号