Non-Cooperative Deep Web Data Source Selection Based on Subject and Probability Model
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61462037, 61562032, 61173146, 61363039, 61363010); Natural Science Foundation of Jiangxi Province of China (20152ACB20003); Science and Technology Landing Plan of Colleges in Jiangxi Province of China (KJLD12022, KJLD14035)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    It is desirable for a user to get high-quality query results from only a few data sources in deep Web data integration systems. Therefore, data source selection becomes one of the core technologies in the integration systems. In this paper, a method based on correlations and diversities is proposed for selecting deep Web data sources suitable for small-scale sampling document summaries. Firstly, considering the correlations between the query and the data sources, a hierarchical subject summary with a probability model of correlation deviation of the data sources is constructed to discriminate the data sources. Furthermore, a method is described for constructing a deviation probability model based on artificial feedbacks and correlation measurement of the data sources. Meanwhile, the diversity-oriented directed edges are built in the hierarchical subject summary of data source in consideration of the diversities of data sources, and an evaluation metric is proposed to measure data source diversities. Taking the data source selection based on correlation and diversity as a combinatorial optimization problem, an optimal result of data source selection is achieved by solving an optimization function. Experimental results show that the proposed method achieves better selection accuracy in selecting data sources with small sampling documents.

    Reference
    Related
    Cited by
Get Citation

邓松,万常选.基于主题与概率模型的非合作深网数据源选择.软件学报,2017,28(12):3241-3256

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 12,2016
  • Revised:March 21,2017
  • Adopted:
  • Online: December 15,2017
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063