基于锚点的无监督跨模态哈希算法
作者:
作者简介:

胡鹏(1990-), 男, 博士, 副研究员, 博士生导师, CCF专业会员, 主要研究领域为机器学习, 多媒体分析;彭玺(1983-), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为机器学习, 多媒体分析;彭德中(1975-), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为盲信号处理, 神经网络.

通讯作者:

彭玺, E-mail: pengx.gm@gmail.com

中图分类号:

TP301

基金项目:

国家自然科学基金(62102274, 62176171, U21B2040 U19A2078); 四川省科技计划(2021YFS0389, 2022YFQ0014, 2022YFSY0047, 2022YFH0021); 中央高校基本科研业务费专项资金(YJ202140); 中国博士后科学基金(2021M692270)


Anchor-based Unsupervised Cross-modal Hashing
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [40]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    基于图的无监督跨模态哈希学习具有存储空间小、检索效率高等优点, 受到学术界和工业界的广泛关注, 已成为跨模态检索不可或缺的工具之一. 然而, 图构造的高计算复杂度阻碍其应用于大规模多模态应用. 主要尝试解决基于图的无监督跨模态哈希学习面临的两个重要挑战: 1)在无监督跨模态哈希学习中如何高效地构建图? 2)如何解决跨模态哈希学习中的离散值优化问题? 针对这两个问题, 分别提出基于锚点图的跨模态学习和可微分哈希层. 具体地, 首先从训练集中随机地选择若干图文对作为锚点集, 利用该锚点集作为中介计算每批数据的图矩阵, 以该图矩阵指导跨模态哈希学习, 从而能极大地降低空间与时间开销; 其次, 提出的可微分哈希层可在网络前向传播时直接由二值编码计算, 在反向传播时亦可产生梯度进行网络更新, 而无需连续值松弛, 从而具有更好的哈希编码效果; 最后, 引入跨模态排序损失, 使得在训练过程中考虑排序结果, 从而提升跨模态检索正确率. 通过在3个通用数据集上与10种跨模态哈希算法进行对比, 验证了提出算法的有效性.

    Abstract:

    Thanks to the low storage cost and high retrieval speed, graph-based unsupervised cross-modal hash learning has attracted much attention from academic and industrial researchers and has been an indispensable tool for cross-modal retrieval. However, the high computational complexity of graph structures prevents its application in large-scale multi-modal applications. This study mainly attempts to solve two important challenges facing graph-based unsupervised cross-modal hash learning: 1) How to efficiently construct graphs in unsupervised cross-modal hash learning? 2) How to handle the discrete optimization in cross-modal hash learning? To address such two problems, this study presents anchor-based cross-modal learning and a differentiable hash layer. To be specific, the study first randomly samples some image-text pairs from the training set as anchor sets and uses the anchor sets as the agent to compute the graph matrix of each batch of data. The graph matrix is used to guide cross-modal hash learning, thus remarkably reducing the space and time cost; second, the proposed differentiable hash layer directly adopts binary coding for computation during network forward propagation and produces gradient to update the network without continuous-value relaxation during backpropagation, thus embracing better hash encoding performance. Finally, the study introduces cross-modal ranking loss to consider the ranking results in the training process and improve the cross-modal retrieval accuracy. To verify the effectiveness of the proposed algorithm, the study compares the algorithm with 10 cross-modal hash algorithms on three general data sets.

    参考文献
    [1] Hu P, Zhen LL, Peng DZ, Liu P. Scalable deep multimodal learning for cross-modal retrieval. In: Proc. of the 42nd Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Paris: ACM, 2019. 635–644.
    [2] Xu X, Lu HM, Song JK, Yang Y, Shen HT, Li XL. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Trans. on Cybernetics, 2020, 50(6): 2400–2413. [doi: 10.1109/TCYB.2019.2928180
    [3] Deng C, Xu XX, Wang H, Yang ML, Tao DC. Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans. on Image Processing, 2020, 29: 8892–8902. [doi: 10.1109/TIP.2020.3020383
    [4] Jin L, Li ZC, Tang JH. Deep semantic multimodal hashing network for scalable image-text and video-text retrievals. IEEE Trans. on Neural Networks and Learning Systems, 2023, 34(4): 1838–1851. [doi: 10.1109/TNNLS.2020.2997020
    [5] Lin ZJ, Ding GG, Han JG, Wang JM. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. on Cybernetics, 2017, 47(12): 4342–4355. [doi: 10.1109/TCYB.2016.2608906
    [6] Deng C, Chen ZJ, Liu XL, Gao XB, Tao DC. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. on Image Processing, 2018, 27(8): 3893–3903. [doi: 10.1109/TIP.2018.2821921
    [7] Su SP, Zhong ZS, Zhang C. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 3027–3035.
    [8] Cao ZJ, Long MS, Wang JM, Yu PS. HashNet: Deep learning to hash by continuation. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 5609–5618.
    [9] Chen ZX, Yuan X, Lu JW, Tian Q, Zhou J. Deep hashing via discrepancy minimization. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 6838–6847.
    [10] Hu P, Peng X, Zhu HY, Lin J, Zhen LL, Peng DZ. Joint versus independent multiview hashing for cross-view retrieval. IEEE Trans. on Cybernetics, 2021, 51(10): 4982–4993. [doi: 10.1109/TCYB.2020.3027614
    [11] Liu H, Ji RR, Wu YJ, Huang FY, Zhang BC. Cross-modality binary code learning via fusion similarity hashing. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6345–6353.
    [12] Zhang J, Peng YX, Yuan MK. Unsupervised generative adversarial cross-modal hashing. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. New Orleans: AAAI, 2018. 539–546.
    [13] Hu HT, Xie LX, Hong RC, Tian Q. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 3120–3129.
    [14] Hu P, Wang X, Zhen LL, Peng DZ. Separated variational hashing networks for cross-modal retrieval. In: Proc. of the 27th ACM Int’l Conf. on Multimedia. Ottawa: ACM, 2019. 1721–1729.
    [15] Hu P, Zhu HY, Peng X, Lin J. Semi-supervised multi-modal learning with balanced spectral decomposition. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI, 2020. 99–106.
    [16] Hu P, Peng X, Zhu HY, Zhen LL, Lin J. Learning cross-modal retrieval with noisy labels. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5399–5409.
    [17] Zhang J, Peng YX. Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. on Multimedia, 2020, 22(1): 174–187. [doi: 10.1109/TMM.2019.2922128
    [18] Kumar S, Udupa R. Learning hash functions for cross-view similarity search. In: Proc. of the 22nd Int’l Joint Conf. on Artificial Intelligence. Barcelona: AAAI, 2011. 1360–1365.
    [19] Jiang QY, Li WJ. Deep cross-modal hashing. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 3270–3278.
    [20] Li K, Qi GJ, Ye J, Hua KA. Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017, 39(9): 1825–1838. [doi: 10.1109/TPAMI.2016.2610969
    [21] Liu XW, Yu GX, Domeniconi C, Wang J, Ren YZ, Guo MZ. Ranking-based deep cross-modal hashing. In: Proc. of the 33rd AAAI Conf. on Artificial Intelligence. Honolulu: AAAI, 2019. 4400–4407.
    [22] Ding K, Fan B, Huo CL, Xiang SM, Pan CH. Cross-modal hashing via rank-order preserving. IEEE Trans. on Multimedia, 2017, 19(3): 571–585. [doi: 10.1109/TMM.2016.2625747
    [23] Zhang Z, Luo HY, Zhu L, Lu GM, Shen HT. Modality-invariant asymmetric networks for cross-modal hashing. IEEE Trans. on Knowledge and Data Engineering, 2023, 35(5): 5091–5104. [doi: 10.1109/TKDE.2022.3144352
    [24] Sun CC, Latapie H, Liu GW, Yan Y. Deep normalized cross-modal hashing with bi-direction relation reasoning. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4937–4945.
    [25] Li ZC, Tang JH. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. on Multimedia, 2015, 17(11): 1989–1999. [doi: 10.1109/TMM.2015.2477035
    [26] Li C, Deng C, Wang L, Xie D, Liu XL. Coupled CycleGAN: Unsupervised hashing network for cross-modal retrieval. In: Proc. of the 33rd AAAI Conf. on Artificial Intelligence. Honolulu: AAAI, 2019. 176–183.
    [27] Li L, Zheng BH, Sun WW. Adaptive structural similarity preserving for unsupervised cross modal hashing. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 3712–3721.
    [28] Bengio Y, Léonard N, Courville A. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432, 2013.
    [29] Su SP, Zhang C, Han K, Tian YH. Greedy hash: Towards fast optimization for accurate hash coding in CNN. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montreal: Curran Associates Inc., 2018. 806–815.
    [30] Liu W, He JF, Chang SF. Large graph construction for scalable semi-supervised learning. In: Proc. of the 27th Int’l Conf. on Machine Learning. Haifa: Omnipress, 2010. 679–686.
    [31] Liu JJ, Zhang ST, Liu W, Deng C, Zheng YJ, Metaxas DN. Scalable mammogram retrieval using composite anchor graph hashing with iterative quantization. IEEE Trans. on Circuits and Systems for Video Technology, 2017, 27(11): 2450–2460. [doi: 10.1109/tcsvt.2016.2592329
    [32] Kingma DP, Ba J. Adam: A method for stochastic optimization. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego, 2015. 1–13.
    [33] Huiskes MJ, Lew MS. The MIR Flickr retrieval evaluation. In: Proc. of the 1st ACM Int’l Conf. on Multimedia Information Retrieval. Vancouver: ACM, 2008. 39–43.
    [34] Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Enrique Sucar L, Villaseñor L, Grubinger M. The segmented and annotated IAPR TC-12 benchmark. Computer Vision and Image Understanding, 2010, 114(4): 419–428. [doi: 10.1016/j.cviu.2009.03.008
    [35] Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N. A new approach to cross-modal multimedia retrieval. In: Proc. of the 18th ACM Int’l Conf. on Multimedia. Firenze: ACM, 2010. 251–260.
    [36] Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. In: Proc. of the 2014 British Machine Vision Conf. Nottingham: BMVA Press, 2014.
    [37] Xu X, Shen FM, Yang Y, Shen HT, Li XL. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. on Image Processing, 2017, 26(5): 2494–2507. [doi: 10.1109/TIP.2017.2676345
    [38] Lu X, Zhu L, Cheng ZY, Li JJ, Nie XS, Zhang HX. Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proc. of the 27th ACM Int’l Conf. on Multimedia. Nice: ACM, 2019. 1129–1137.
    [39] Zhou JL, Ding GG, Guo YC. Latent semantic sparse hashing for cross-modal similarity search. In: Proc. of the 37th Int’l ACM SIGIR Conf. on Research & Development in Information Retrieval. Gold Coast: ACM, 2014. 415–424.
    [40] Ding GG, Guo YC, Zhou JL, Gao Y. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Trans. on Image Processing, 2016, 25(11): 5427–5440. [doi: 10.1109/TIP.2016.2607421
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

胡鹏,彭玺,彭德中.基于锚点的无监督跨模态哈希算法.软件学报,2024,35(8):3739-3751

复制
分享
文章指标
  • 点击次数:624
  • 下载次数: 1773
  • HTML阅读次数: 782
  • 引用次数: 0
历史
  • 收稿日期:2021-08-30
  • 最后修改日期:2022-10-13
  • 在线发布日期: 2023-09-06
  • 出版日期: 2024-08-06
文章二维码
您是第19985308位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号