[关键词]
[摘要]
在Web 数据集成的过程中,如何从大量的Web 数据源集合中选择合适数量的数据源,使得在满足特定查询需求的前提下尽可能地减少所需访问的数据源数量,同时保持返回数据结果的高质量,成为Web 数据集成中的一个热点问题.以近十几年的研究实践为背景,介绍Web 数据源选择的研究沿革及现状,并对Web 数据源选择方法进行了归类.分别讨论了基于相关性的和基于质量的数据源选择的研究动机、研究方法和研究成果等,并对相关研究的目标、关键技术、优点和缺点进行了对比分析;最后展望了Web 数据源选择未来的研究方向.
[Key word]
[Abstract]
In Web data integration, selecting data from a Web data source collection such that the specific query intents are satisfied while the number of accesses to data sources is minimized and the quality of returned results are guaranteed is a popular topic. In this paper, using the researches and practices in recent ten years as the background, the study focuses on the evolution and presents research in the area of Web data source selection and classifies Web data source selection methods. In addition, the paper discusses the research motivations, methods and results of relevance-based data source selection and quality-based data source selection. Moreover, the paper introduces the correlation research results and analyzes their destinations, key techniques, merits and demerits. Finally, some directions for future research are put forward.
[中图分类号]
[基金项目]
国家自然科学基金(61173146); 江西省高等学校科技落地计划(产学研合作)(KJLD12022); 江西省教育厅科技项目(GJJ12733, GJJ12732, GJJ11729)