Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
Author:
Affiliation:

Clc Number:

TP391

  • Article
  • | |
  • Metrics
  • |
  • Reference [42]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Zero-shot sketch-based image retrieval uses sketches of unseen classes as query samples for retrieving images of those classes. This task is thus faced with two challenges: the modal gap between a sketch and the image and inconsistencies between seen and unseen classes. Previous approaches tried to eliminate the modal gap by projecting the sketch and the image into a common space and bridge the semantic inconsistencies between seen and unseen classes with semantic embeddings (e.g., word vectors and word similarity). This study proposes a cross-modal self-distillation approach to investigate generalizable features from the perspective of knowledge distillation without the involvement of semantic embeddings in training. Specifically, the knowledge of the pre-trained image recognition network is transferred to the student network through traditional knowledge distillation. Then, according to the cross-modal correlation between a sketch and the image, cross-modal self-distillation indirectly transfers the above knowledge to the recognition of the sketch modality to enhance the discriminative and generalizable features of sketch features. To further promote the integration and propagation of the knowledge within the sketch modality, this study proposes sketch self-distillation. By learning discriminative and generalizable features from the data, the student network eliminates the modal gap and semantic inconsistencies. Extensive experiments conducted on three benchmark datasets, namely Sketchy, TU-Berlin, and QuickDraw, demonstrate the superiority of the proposed cross-modal self-distillation approach to the state-of-the-art ones.

    Reference
    [1] Shen YM, Liu L, Shen FM, Shao L. Zero-shot sketch-image hashing. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 3598–3607.
    [2] Xu X, Lu HM, Song JK, Yang Y, Shen HT, Li XL. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Transactions on cybeRnetics, 2020, 50(6): 2400–2413. [doi: 10.1109/TCYB.2019.2928180]
    [3] Xu X, Wang T, Yang Y, Zuo L, Shen FM, Shen HT. Cross-modal attention with semantic consistence for image–text matching. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5412–5425. [doi: 10.1109/TNNLS.2020.2967597]
    [4] Dutta A, Akata Z. Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5084–5093.
    [5] Yelamarthi SK, Reddy SK, Mishra A, Mittal A. A zero-shot framework for sketch based image retrieval. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 316–333.
    [6] Lin K, Xu X, Gao LL, Wang Z, Shen HT. Learning cross-aligned latent embeddings for zero-shot cross-modal retrieval. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. New Orleans: AAAI, 2020. 11515–11522.
    [7] Zhu JW, Xu X, Shen FM, Lee RKW, Wang Z, Shen HT. Ocean: A dual learning approach for generalized zero-shot sketch-based image retrieval. In: Proc. of the 2020 IEEE Int’l Conf. on Multimedia and Expo. London: IEEE, 2020. 1–6.
    [8] Dey S, Riba P, Dutta A, Lladós J, Song YZ. Doodle to search: Practical zero-shot sketch-based image retrieval. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2174–2183.
    [9] Xu X, Lin KY, Yang Y, Hanjalic A, Shen HT. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3030–3047. [doi: 10.1109/TPAMI.2020.3045530]
    [10] Liu Q, Xie LX, Wang HY, Yuille A. Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 3661–3670.
    [11] Liu L, Shen FM, Shen YM, Liu XL, Shao L. Deep sketch hashing: Fast free-hand sketch-based image retrieval. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2298–2307.
    [12] Zhang H, Liu S, Zhang CQ, Ren WQ, Wang R, Cao XC. SketchNet: Sketch classification with web images. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1105–1113.
    [13] Eitz M, Hildebrand K, Boubekeur T, Alexa M. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics, 2010, 34(5): 482–498. [doi: 10.1016/j.cag.2010.07.002]
    [14] 樊亚春, 谭小慧, 周明全, 郑霞. 基于局部多尺度的三维模型草图检索方法. 计算机学报, 2017, 40(11): 2448–2465. [doi: 10.11897/SP.J.1016.2017.02448]
    Fan YC, Tan XH, Zhou MQ, Zheng X. A scale invariant local descriptor for sketch based 3d model retrieval. Chinese Journal of Computers, 2017, 40(11): 2448–2465 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2017.02448]
    [15] Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification. In: Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. 539–546.
    [16] Sangkloy P, Burnell N, Ham C, Hays J. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics, 2016, 35(4): 119. [doi: 10.1145/2897824.2925954]
    [17] Song JF, Yu Q, Song YZ, Xiang T, Hospedales TM. Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 5552–5561.
    [18] 陈健, 白琮, 马青, 郝鹏翼, 陈胜勇. 面向细粒度草图检索的对抗训练三元组网络. 软件学报, 2020, 31(7): 1933–1942. http://www.jos.org.cn/1000-9825/5934.htm
    Chen J, Bai C, Ma Q, Hao PY, Chen SY. Adversarial training triplet network for fine-grained sketch based image retrieval. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 1933–1942 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5934.htm
    [19] Xu P, Yin QY, Huang YY, Song YZ, Ma ZY, Wang L, Xiang T, Kleijn WB, Guo J. Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing, 2018, 278: 75–86. [doi: 10.1016/j.neucom.2017.05.099]
    [20] Wang YF, Huang F, Zhang YJ, Feng R, Zhang T, Fan WG. Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval. Pattern Recognition, 2020, 100: 107148. [doi: 10.1016/j.patcog.2019.107148]
    [21] Lampert CH, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 453–465. [doi: 10.1109/TPAMI.2013.140]
    [22] Romera-Paredes B, Torr PHS. An embarrassingly simple approach to zero-shot learning. In: Proc. of the 32nd Int’l Conf. on Machine Learning. Lille: ICML, 2015. 2152–2161.
    [23] Zhang L, Xiang T, Gong SG. Learning a deep embedding model for zero-shot learning. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 3010–3019.
    [24] 王紫沁, 杨维. 基于语义对齐和重构的零样本学习算法. 计算机工程与设计, 2021, 42(1): 70–75. [doi: 10.16208/j.issn1000-7024.2021.01.011]
    Wang ZQ, Yang W. Zero-shot learning based on semantic alignment and reconstruction. Computer Engineering and Design, 2021, 42(1): 70–75 (in Chinese with English abstract). [doi: 10.16208/j.issn1000-7024.2021.01.011]
    [25] Long Y, Liu L, Shao L, Shen FM, Ding GG, Han JG. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6165–6174.
    [26] Xian YQ, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5542–5551.
    [27] Chen Z, Wang S, Li JJ, Huang Z. Rethinking generative zero-shot learning: An ensemble learning perspective for recognising visual patches. In: Proc. of the 28th ACM Int’l Conf. on Multimedia. Seattle: ACM, 2020. 3413–3421.
    [28] 刘帅, 史彩娟, 刘靖祎, 周文博, 程琦云. 基于循环一致性的零样本分类. 见: 第十四届全国信号和智能信息处理与应用学术会议论文集. 2021. 500–507.
    Liu S, Shi CJ, Liu JY, Zhou WB, Chen QY. Zero-shot classification based on cycle-consistency. In: Proc. of the 14th National Conf. on Signal and Intelligent Information Processing and Application. Beijing, 2021. 500–507 (in Chinese with English abstract).
    [29] Wang J, Bao WD, Sun LC, Zhu XM, Cao BK, Yu PS. Private model compression via knowledge distillation. In: Proc. of the 33rd AAAI Conf. on Artificial Intelligence. Honolulu: IEEE, 2019. 1190–1197.
    [30] Papernot N, McDaniel P, Wu X, Jha S, Swami A. Distillation as a defense to adversarial perturbations against deep neural networks. In: Proc. of the 2016 IEEE Symp. on Security and Privacy. San Jose: IEEE, 2016. 582–597.
    [31] Gao ZF, Chung J, Abdelrazek M, Leung S, Hau WK, Xian ZC, Zhang HY, Li S. Privileged modality distillation for vessel border detection in intracoronary imaging. IEEE Transactions on Medical Imaging, 2020, 39(5): 1524–1534. [doi: 10.1109/TMI.2019.2952939]
    [32] Peng BY, Jin X, Li DS, Zhou SF, Wu YC, Liu JH, Zhang ZN, Liu Y. Correlation congruence for knowledge distillation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 5006–5015.
    [33] Tung F, Mori G. Similarity-preserving knowledge distillation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 1365–1374.
    [34] Ye HJ, Lu S, Zhan DC. Distilling cross-task knowledge via relationship matching. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12393–12402.
    [35] Shen HT, Liu LC, Yang Y, Xu X, Huang Z, Shen FM, Hong RC. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(10): 3351–3365. [doi: 10.1109/tkde.2020.2970050]
    [36] Lu P, Huang G, Lin HY, Yang WM, Guo GD, Fu YE. Domain-aware SE network for sketch-based image retrieval with multiplicative euclidean margin softmax. In: Proc. of the 29th ACM Int’l Conf. on Multimedia. ACM, 2021. 3418–3426.
    [37] Kodirov E, Xiang T, Gong SG. Semantic autoencoder for zero-shot learning. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 4447–4456.
    [38] Yang Y, Luo YD, Chen WL, Shen FM, Shao J, Shen HT. Zero-shot hashing via transferring supervised knowledge. In: Proc. of the 24th ACM Int’l Conf. on Multimedia. Amsterdam: ACM, 2016. 1286–1295.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

田加林,徐行,沈复民,申恒涛.基于跨模态自蒸馏的零样本草图检索.软件学报,2022,33(9):3152-3164

Copy
Share
Article Metrics
  • Abstract:1589
  • PDF: 4630
  • HTML: 3468
  • Cited by: 0
History
  • Received:June 27,2021
  • Revised:August 15,2021
  • Online: February 22,2022
  • Published: September 06,2022
You are the first2044069Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063