国家自然科学基金(61371196); 中国博士后科学基金(2015M582832); 国家科技重大专项(2015ZX01040201-003)
实体分辨广泛地存在于数据质量控制、信息检索、数据集成等数据任务中. 传统的实体分辨主要面向关系型数据, 而随着大数据技术的发展, 文本、图像等模态不同的数据大量涌现催生了跨模态数据应用需求, 将跨模态数据实体分辨提升为大数据处理和分析的基础问题之一. 对跨模态实体分辨问题的研究进展进行回顾, 首先介绍问题的定义、评价指标; 然后, 以模态内关系的保持和模态间关系的建立为主线, 对现有研究进行总结和梳理; 并且, 通过在多个公开数据集上对常用方法进行测试, 对出现差异的原因和进行分析; 最后, 总结当前研究仍然存在的问题, 并依据这些问题给出未来可能的研究方向.
Entity resolution widely exists in data tasks such as data quality control, information retrieval, and data integration. Traditional entity resolution methods mainly focus on relational data, while with the development of big data technology, the application requirements of cross-modal data are generated due to the proliferation of different modal data including texts and images. Hence, cross-modal data entity resolution has become a fundamental problem in big data processing and analysis. In this study, the research development of cross-modal entity resolution is reviewed, and its definition and evaluation indexes are introduced. Then, with the construction of inter-modal relationships and the maintenance of intra-modal relationships as the main line, existing research results are surveyed. In addition, widely used methods are tested on different open datasets, and their differences and reasons behind them are analyzed. Finally, the problems in the present research are concluded, on the basis of which the future research trends are given.