Partial Multimodal Hashing Based on Fine-grained Feature Fusion
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Due to the exponential growth of multimodal data, traditional databases are confronted with challenges in terms of storage and retrieval. Multimodal hashing is able to effectively reduce the storage cost of databases and improve retrieval efficiency by fusing multimodal features and mapping them into binary hash codes. Although many works on multimodal hashing perform well, there are also three important problems to be solved: (1) Existing methods tend to consider that all samples are modality-complete, while in practical retrieval scenarios, it is also common for samples to miss partial modalities; (2) Most methods are based on shallow learning models, which inevitably limits models’ learning ability and affects the final retrieval performance; (3) Some methods based on deep learning framework have been proposed to address the issue of weak learning ability, but they directly use coarse-grained feature fusion methods, such as concatenation, after extracting features from different modalities, which fails to effectively capture deep semantic information, thereby weakening the representation ability of hash codes and affecting the final retrieval performance. In response to the above problems, the PMH-F3 model is proposed. This model implements partial multimodal hashing for the case of samples missing partial modalities. The model is based on deep network architecture, and the Transformer encoder is used to capture deep semantics in self-attention manner, achieving fine-grained multimodal feature fusion. Sufficient experiments are conducted on MIR Flickr and MS COCO datasets and the best retrieval performance is achieved. The results of experiments show that PMH-F3model can effectively implement partial multimodal hashing and can be applied to large-scale multimodal data retrieval.

    Reference
    Related
    Cited by
Get Citation

殷崭祚,李博涵,王萌,黄瑞龙,吴文隆,王昊奋.基于细粒度特征融合的部分多模态哈希.软件学报,2024,35(3):1074-1089

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 17,2023
  • Revised:September 05,2023
  • Adopted:
  • Online: November 08,2023
  • Published: March 06,2024
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063