Abstract:Deep learning models have yielded impressive results in many tasks. However, the success hinges on the availability a large amount of labeled samples for model training, and it is agreed that deep models tend to perform poorly in scenarios where labeled samples are scarce. To this end, recently few-shot learning (FSL) has been proposed to study how to learn quickly from a small number of samples, and has achieved good performance with the adoption of meta-learning. Nevertheless, two issues exist:1) Existing FSL methods usually manage to recognize novel concepts solely based on the visual characteristics of samples, without integrating information from other modalities; 2) By following the paradigm of meta-learning, a model aims at learning generic and transferable knowledge from lots of similar fake few-shot tasks, which inevitably leads to a feature space of good transferability but with weak representation ability. To tackle the two issues, we introduce model pre-training techniques and multimodal learning techniques into the FSL process in this paper, and propose a new multimodal-guided local feature selection strategy for few-shot learning. Specifically, we first train the target model for recognizing a set of known or seen classes, each with abundant samples, which greatly improves the representation learning ability of the model. Then, in the meta-learning stage, we further optimize the pre-trained model on a set of randomly sampled few-shot tasks, which improves its transferability or its ability of adapting to the challenging FSL environment. The proposed multimodal-guided local feature selection strategy employed during mete-learning leverages both visual features and textual features, which helps construct more discriminative features and alleviate degradation of the model's representation ability. The resultant sample features are finally utilized for few-shot learning. Experiments on three benchmark datasets, namely miniImageNet, CIFAR-FS and FC100, demonstrate that our proposed FSL method can achieve better results.