多模态引导的局部特征选择小样本学习方法
作者:
作者简介:

吕天根(1997-),男,硕士生,CCF学生会员,主要研究领域为小样本学习;洪日昌(1981-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为多媒体技术,人工智能,大数据;何军(1992-),男,博士,主要研究领域为模式识别,小样本学习,弱监督学习;胡社教(1964-),男,博士,教授,主要研究领域为智能检测与信号处理,智能配变终端系统,嵌入式控制系统

通讯作者:

洪日昌,hongrc.hfut@gmail.com

基金项目:

国家自然科学基金(61932009)


Multimodal-guided Local Feature Selection for Few-shot Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [70]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    深度学习模型取得了令人瞩目的成绩,但其训练依赖于大量的标注样本,在标注样本匮乏的场景下模型表现不尽人意.针对这一问题,近年来以研究如何从少量样本快速学习的小样本学习被提了出来,方法主要采用元学习方式对模型进行训练,取得了不错的学习效果.但现有方法:1)通常仅基于样本的视觉特征来识别新类别,信息源较为单一;2)元学习的使用使得模型从大量相似的小样本任务中学习通用的、可迁移的知识,不可避免地导致模型特征空间趋于一般化,存在样本特征表达不充分、不准确的问题.为解决上述问题,将预训练技术和多模态学习技术引入小样本学习过程,提出基于多模态引导的局部特征选择小样本学习方法.所提方法首先在包含大量样本的已知类别上进行模型预训练,旨在提升模型的特征表达能力;而后在元学习阶段,方法利用元学习对模型进行进一步优化,旨在提升模型的迁移能力或对小样本环境的适应能力,所提方法同时基于样本的视觉特征和文本特征进行局部特征选择来提升样本特征的表达能力,以避免元学习过程中模型特征表达能力的大幅下降;最后所提方法利用选择后的样本特征进行小样本学习.在MiniImageNet、CIFAR-FS和FC-100这3个基准数据集上的实验表明,所提的小样本学习方法能够取得更好的小样本学习效果.

    Abstract:

    Deep learning models have yielded impressive results in many tasks. However, the success hinges on the availability of a large number of labeled samples for model training, and deep learning models tend to perform poorly in scenarios where labeled samples are scarce. In recent years, few-shot learning (FSL) has been proposed to study how to learn quickly from a small number of samples and has achieved good performance mainly by the use of meta-learning for model training. Nevertheless, two issues exist: 1) Existing FSL methods usually manage to recognize novel classes solely with the visual features of samples, without integrating information from other modalities. 2) By following the paradigm of meta-learning, a model aims at learning generic and transferable knowledge from massive similar few-shot tasks, which inevitably leads to a generalized feature space and insufficient and inaccurate representation of sample features. To tackle the two issues, this study introduces pre-training and multimodal learning techniques into the FSL process and proposes a new multimodal-guided local feature selection strategy for few-shot learning. Specifically, model pre-training is first conducted on known classes with abundant samples to greatly improve the feature representation ability of the model. Then, in the meta-learning stage, the pre-trained model is further optimized by meta-learning to improve its transferability or its adaptability to the few-shot environment. Meanwhile, the local feature selection is carried out on the basis of visual features and textual features of samples to enhance the ability to represent sample features and avoid sharp degradation of the model’s representation ability. Finally, the resultant sample features are utilized for FSL. The experiments on three benchmark datasets, namely, MiniImageNet, CIFAR-FS, and FC-100, demonstrate that the proposed FSL method can achieve better results.

    参考文献
    [1] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems. Curran Associates Inc., 2017.
    [2] Henaff O. Data-efficient image recognition with contrastive predictive coding. In: Proc. of the 37th Int’l Conf. on Machine Learning. PMLR, 2020. 4182–4192.
    [3] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proc. of the 30th Conf. on Neural Information Processing Systems. Barcelona: ACM, 2016. 3637–3645.
    [4] Pahde F, Puscas M, Klein T, Nabi M. Multimodal prototypical networks for few-shot learning. In: Proc. of the 2021 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2021. 2644–2653.
    [5] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: ACM, 2017. 4080–4090.
    [6] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proc. of the 34th Int’l Conf. on Machine Learning. Sydney: PMLR, 2017. 1126–1135.
    [7] Li ZG, Zhou FW, Chen F, Li H. Meta-SGD: Learning to learn quickly for few-shot learning. arXiv:1707.09835, 2017.
    [8] Bertinetto L, Henriques JF, Torr PHS, Vedaldi A. Meta-learning with differentiable closed-form solvers. arXiv:1805.08136, 2018.
    [9] Lee K, Maji S, Ravichandran A, Soatto S. Meta-learning with differentiable convex optimization. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 10657–10665.
    [10] Hariharan B, Girshick R. Low-shot visual recognition by shrinking and hallucinating features. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 3018–3027.
    [11] Zhang HG, Zhang J, Koniusz P. Few-shot learning via saliency-guided hallucination of samples. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2770–2779.
    [12] Luo QX, Wang LF, Lv JG, Xiang SM, Pan CH. Few-shot learning via feature hallucination with variational inference. In: Proc. of the 2021 IEEE Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2021. 3963–3972.
    [13] Li K, Zhang YL, Li KP, Fu Y. Adversarial feature hallucination networks for few-shot learning. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 13470–13479.
    [14] Qiao LM, Shi YM, Li J, Tian YH, Huang TJ, Wang YW. Transductive episodic-wise adaptive metric for few-shot learning. In: Proc. of the 2019 IEEE Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 3603–3612.
    [15] Qi GD, Yu HM, Lu ZH, Li SZ. Transductive few-shot classification on the oblique manifold. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 8392–8402.
    [16] Liu YB, Lee J, Park M, Kim S, Yang E, Hwang SJ, Yang Y. Learning to propagate labels: Transductive propagation network for few-shot learning. In: Proc. of the 7th Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019. 1–14.
    [17] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: Proc. of the 30th Int’l Conf. on Machine Learning. Atlanta: PMLR, 2013. 1139–1147.
    [18] Zhang HY, Cissé M, Dauphin YN, Lopez-Paz D. Mixup: Beyond empirical risk minimization. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018. 1–13.
    [19] DeVries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552, 2017.
    [20] Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: Regularization strategy to train strong classifiers with localizable features. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 6023–6032.
    [21] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Montreal: ACM, 2014. 2672–2680.
    [22] Raghu A, Raghu M, Bengio S, Vinyals O. Rapid learning or feature reuse? Towards understanding the effectiveness of MAML. In: Proc. of the 8th Int’l Conf. on Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–21.
    [23] Tian YL, Wang Y, Krishnan D, Tenenbaum JB, Isola P. Rethinking few-shot image classification: A good embedding is all you need? In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 266–282.
    [24] He J, Hong RC, Liu XL, Xu ML, Sun QR. Revisiting local descriptor for improved few-shot classification. ACM Trans. on Multimedia Computing, Communications, and Applications, 2022, 18(2s): 127. [doi: 10.1145/3511917
    [25] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.
    [26] Woo S, Park J, Lee JY, Kweon IS. CBAM: Convolutional block attention module. In: Proc. of the 15th European Conf. on Computer Vision on Computer Vision. Munich: Springer, 2018. 3–19.
    [27] Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z, Lin S, Guo BN. Swin Transformer: Hierarchical vision transformer using shifted windows. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 10012–10022.
    [28] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017. 1–14.
    [29] Wu ZY, Li YW, Guo LH, Jia K. PARN: Position-aware relation networks for few-shot learning. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 6659–6667.
    [30] Doersch C, Gupta A, Zisserman A. CrossTransformers: Spatially-aware few-shot transfer. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: ACM, 2020. 21981–21993.
    [31] Tsai YHH, Bai SJ, Liang PP, Kolter JZ, Morency LP, Salakhutdinov R. Multimodal transformer for unaligned multimodal language sequences. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 6558–6569.
    [32] Hong DF, Gao LR, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. on Geoscience and Remote Sensing, 2021, 59(5): 4340–4354. [doi: 10.1109/TGRS.2020.3016820
    [33] Wang YK, Huang WB, Sun FC, Xu TY, Rong Y, Huang JZ. Deep multimodal fusion by channel exchanging. In: Proc. of the 34th Conf. on Neural Information Processing Systems. Vancouver: NeurIPS, 2020. 4835–4845.
    [34] Li FF, Fergus R, Perona P. A Bayesian approach to unsupervised one-shot learning of object categories. In: Proc. of the 9th IEEE Int’l Conf. on Computer Vision. Nice: IEEE, 2003. 1134–1141.
    [35] Xu WJ, Xian YQ, Wang JN, Schiele B, Akata Z. Attribute prototype network for zero-shot learning. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: ACM, 2020. 21969–21980.
    [36] Hubert Tsai YH, Huang LK, Salakhutdinov R. Learning robust visual-semantic embeddings. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 3571–3580.
    [37] Xing C, Rostamzadeh N, Oreshkin BN, Pinheiro PO. Adaptive cross-modal few-shot learning. In: Proc. of the 33rd Conf. on Neural Information Processing Systems. Vancouver: NeurIPS, 2019. 32.
    [38] Zhu YH, Min WQ, Jiang SQ. Attribute-guided feature learning for few-shot image recognition. IEEE Trans. on Multimedia, 2021, 23: 1200–1209. [doi: 10.1109/tmm.2020.2993952
    [39] Sulc M, Picek L, Matas J, Jeppesen TS, Heilmann-Clausen J. Fungi recognition: A practical use case. In: Proc. of the 2020 IEEE Winter Conf. on Applications of Computer Vision. Snowmass: IEEE, 2020. 2316–2324.
    [40] Kiss N, Czùni L. Mushroom image classification with CNNs: A case-study of different learning strategies. In: Proc. of the 12th Int’l Symp. on Image and Signal Processing and Analysis. Zagreb: IEEE, 2021. 165–170.
    [41] Tan MX, Pang RM, Le QV. EfficientDet: Scalable and efficient object detection. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 10781–10790.
    [42] Ren SQ, He KM, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031
    [43] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019. 4171–4186.
    [44] Zhu JH, Xia YC, Wu LJ, He D, Qin T, Zhou WG, Li HQ, Liu TY. Incorporating BERT into neural machine translation. In: Proc. of the 8th Int’l Conf. on Learning Representations. Addis Ababa: OpenReview.net, 2020. 1–18.
    [45] Garg S, Ramakrishnan G. BAE: BERT-based adversarial examples for text classification. arXiv:2004.01970, 2020.
    [46] Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proc. of the 37th Int’l Conf. on Machine Learning. PMLR, 2020. 1597–1607.
    [47] He KM, Fan HQ, Wu YX, Xie SN, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 9729–9738.
    [48] Sun QR, Liu YY, Chua TS, Schiele B. Meta-transfer learning for few-shot learning. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 403–412.
    [49] Chen WY, Liu YC, Kira Z, Wang YCF, Huang JB. A closer look at few-shot classification. In: Proc. of the 7th Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019. 1–16.
    [50] Chen YB, Wang XL, Liu Z, Xu HJ, Darrell T. A new meta-baseline for few-shot learning. arXiv:2003.04390v2, 2020.
    [51] Hou RB, Chang H, Ma BP, Shan SG, Chen XL. Cross attention network for few-shot classification. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: ACM, 2019. 4003–4014.
    [52] Huang ST, Zhang M, Kang YC, Wang DL. Attributes-guided and pure-visual attention alignment for few-shot recognition. Proc. of the 2021 AAAI Conf. on Artificial Intelligence, 2021, 35(9): 7840–7847. [doi: 10.1609/aaai.v35i9.16957
    [53] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proc. of the 32nd Int’l Conf. on Machine Learning. Lille: JMLR, 2015. 1–30.
    [54] Hoffman J, Tzeng E, Donahue J, Jia YQ, Saenko K, Darrell T. One-shot adaptation of supervised deep convolutional models. arXiv:1312.6204, 2013.
    [55] Santoro A, Bartunov S, Botvinick MM, Wierstra D, Lillicrap TP. Meta-learning with memory-augmented neural networks. In: Proc. of the 33rd Int’l Conf. on Machine Learning. New York City: PMLR, 2016. 1842–1850.
    [56] Sung F, Yang YX, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to compare: Relation network for few-shot learning. In: Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1199–1208.
    [57] Schmidhuber J. Evolutionary principles in self-referential learning [Ph.D. Thesis]. München: Technische Universität München, 1987.
    [58] Hinton GE, Plaut DC. Using fast weights to deblur old memories. In: Proc. of the 9th Annual Conf. of the Cognitive Science Society. 1987. 177–186.
    [59] Qi H, Brown M, Lowe DG. Low-shot learning with imprinted weights. In: Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5822–5830.
    [60] Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proc. of the 2018 IEEE Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4367–4375.
    [61] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Huang ZH, Ma S, Huang ZH, Karpathy A, Khosla A, Bernstein M, Berg AC, Li FF. Imagenet large scale visual recognition challenge. Int’l Journal of Computer Vision, 2015, 115(3): 211–252. [doi: 10.1007/s11263-015-0816-y
    [62] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009: 1–60.
    [63] Oreshkin BN, Rodríguez P, Lacoste A. TADAM: Task dependent adaptive metric for improved few-shot learning. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montréal: ACM, 2018. 719–729.
    [64] Munkhdalai T, Yuan X, Mehri S, Trischler A. Rapid adaptation with conditionally shifted neurons. In: Proc. of the 35th Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 3661–3670.
    [65] Simon C, Koniusz P, Nock R, Harandi M. Adaptive subspaces for few-shot learning. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 4136–4145.
    [66] Yang P, Ren SG, Zhao Y, Li P. Calibrating CNNs for few-shot meta learning. In: Proc. of the 2022 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2022. 2090–2099.
    [67] Zhang J, Zhao CL, Ni BB, Xu MH, Yang XK. Variational few-shot learning. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 1685–1694.
    [68] Wang YX, Hebert M. Learning to learn: Model regression networks for easy small sample learning. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 616–634.
    [69] Ghiasi G, Lin TY, Le QV. DropBlock: A regularization method for convolutional networks. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montréal: ACM, 2018. 10750–10760.
    [70] Ravichandran A, Bhotika R, Soatto S. Few-shot learning with embedded class models and shot-free meta training. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 331–339.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

吕天根,洪日昌,何军,胡社教.多模态引导的局部特征选择小样本学习方法.软件学报,2023,34(5):2068-2082

复制
分享
文章指标
  • 点击次数:1940
  • 下载次数: 5037
  • HTML阅读次数: 3544
  • 引用次数: 0
历史
  • 收稿日期:2022-04-18
  • 最后修改日期:2022-05-29
  • 在线发布日期: 2022-09-20
  • 出版日期: 2023-05-06
文章二维码
您是第19727315位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号