Classification of Deep Web Databases Based on the Context of Web Pages
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    New techniques are discussed for enhancing the classification precision of deep Web databases, which include utilizing the content texts of the HTML pages containing the database entry forms as the context and a unification processing for the database attribute labels. An algorithm to find out the content texts in HTML pages is developed based on multiple statistic characteristics of the text blocks in HTML pages. The unification processing for database attributes is to let the attribute labels that are closed semantically be replaced with delegates. The domain and language knowledge found in learning samples is represented in hierarchical fuzzy sets and an algorithm for the unification processing is proposed based on the presentation. Based on the pre-computing a k-NN (k nearest neighbors) algorithm is given for deep Web database classification, where the semantic distance between two databases is calculated based on both the distance between the content texts of the HTML pages and the distance between database forms embedded in the pages. Various classification experiments are carried out to compare the classification results done by the algorithm with pre-computing and the one without the pre-computing in terms of classification precision, recall and F1 values.

    Reference
    Related
    Cited by
Get Citation

马 军,宋 玲,韩晓晖,闫 泼.基于网页上下文的Deep Web数据库分类.软件学报,2008,19(2):267-274

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 31,2007
  • Revised:November 19,2007
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063