Abstract:This paper proposes a novel integrated indexing structure for the large-scale cross-media retrieval. In the cross-media retrieval, first a cross reference graph (CRG) is generated by hyperlink analysis of the webpages, which supports the cross-media retrieval. Then a refinement process of the CRG is conducted by users' relevance feedbacks. Three steps are made. First, when the user submits a query media object, the candidate media objects are quickly identified by searching the cross reference graph. Then the distance computation of the candidate vectors is conducted to get the answer set. The analysis and experimental results show that the performance of the algorithm is superior to that of sequential scan.