Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval

doi:10.13328/j.cnki.jos.006620

微信服务号

微信订阅号

2025-6-3- 1

Home > Archive>Volume 33, Issue 9, 2022 >3152-3164. DOI:10.13328/j.cnki.jos.006620

PDF HTML XML Export Cite reminder

Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
DOI:
                        10.13328/j.cnki.jos.006620
                    
Author:
                        TIAN Jia-LinTIAN Jia-Lin
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XU XingXU Xing
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN Fu-MinSHEN Fu-Min
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN Heng-TaoSHEN Heng-Tao
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP391
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Zero-shot sketch-based image retrieval uses sketches of unseen classes as query samples for retrieving images of those classes. This task is thus faced with two challenges: the modal gap between a sketch and the image and inconsistencies between seen and unseen classes. Previous approaches tried to eliminate the modal gap by projecting the sketch and the image into a common space and bridge the semantic inconsistencies between seen and unseen classes with semantic embeddings (e.g., word vectors and word similarity). This study proposes a cross-modal self-distillation approach to investigate generalizable features from the perspective of knowledge distillation without the involvement of semantic embeddings in training. Specifically, the knowledge of the pre-trained image recognition network is transferred to the student network through traditional knowledge distillation. Then, according to the cross-modal correlation between a sketch and the image, cross-modal self-distillation indirectly transfers the above knowledge to the recognition of the sketch modality to enhance the discriminative and generalizable features of sketch features. To further promote the integration and propagation of the knowledge within the sketch modality, this study proposes sketch self-distillation. By learning discriminative and generalizable features from the data, the student network eliminates the modal gap and semantic inconsistencies. Extensive experiments conducted on three benchmark datasets, namely Sketchy, TU-Berlin, and QuickDraw, demonstrate the superiority of the proposed cross-modal self-distillation approach to the state-of-the-art ones.

Key words:zero-shot sketch-based image retrieval;zero-shot learning;cross-modal retrieval;knowledge distillation

Get Citation

田加林,徐行,沈复民,申恒涛.基于跨模态自蒸馏的零样本草图检索.软件学报,2022,33(9):3152-3164

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 27,2021
Revised:August 15,2021
Adopted:
Online: February 22,2022
Published: September 06,2022

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History