Abstract:Most cross-modal hash retrieval methods only use cosine similarity for feature matching, employ one single calculation method, and do not take into account the impact of instance relations on performance. For this reason, the study proposes a novel method based on reasoning in multiple instance relation graphs. Global and local instance relation graphs are generated by constructing similarity matrices to fully explore the fine-grained relations among the instances. Similarity reasoning is then conducted on the basis of the multiple instance relation graphs. For this purpose, reasoning is performed within the relation graphs in the image and text modalities, respectively. Then, the relations within each modality are mapped to the instance graphs for reasoning. Finally, reasoning within the instance graphs is performed. Furthermore, the neural network is trained by a step-by-step training strategy to adapt to the features of the image and text modalities. Experiments on the MIRFlickr and NUS-WIDE datasets demonstrate that the proposed method has distinct advantages in the metric mean average precision (mAP) and obtains a favorable Top-k-Precision curve. This also indicates that the proposed method deeply explores instance relations and thereby significantly improves the retrieval performance.