Abstract:In recent years, zero-shot learning has attracted extensive attention in machine learning and computer vision. The conventional inductive zero-shot learning attempts to establish the mappings between semantic and visual features for transferring the knowledge between classes. However, such approaches suffer from the projection domain shift between the seen and unseen classes. The transductive zero-shot learning is proposed to alleviate this issue by leveraging the unlabeled unseen data for domain adaptation in the training stage. Unfortunately, empirically study finds that these transductive zero-shot learning approaches, which optimize the semantic mapping and domain adaption in visual feature space simultaneously, are easy to trap in "mutual restriction", and thereby limit the potentials of both these two steps. In order to address the aforementioned issue, a novel transductive zero-shot learning approach named feature generation with indirect domain adaption (FG-IDA) is proposed, that conducts the semantic mapping and domain adaption orderly and optimizes these two steps in different spaces separately for inspiring their performance potentials and further improving the zero-shot recognition accuracy. FG-IDA is evaluated on four benchmarks, namely CUB, AWA1, AWA2, and SUN. The experimental results demonstrate the superiority of the proposed method over other transductive zero-shot learning approaches, and also show that FG-IDA achieves the state-of-the-art performances on CUB, AWA1, and AWA2 datasets. Moreover, the detailed ablation analysis is conducted and the results empirically prove the existence of the "mutual restriction" effect in direct domain adaption-based transductive zero-shot learning approaches and the effectiveness of the indirect domain adaption idea.