Efficient World-Wide-Web Information Gathering
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the information available through World-Wide-Web becoming overwhelming, e fficient information gathering (IG) tools are necessary. Since the network resou rces are expensive, so IG is a resource-bounded task. The main purpose of this paper is to find an efficient gathering method for specific topic. This paper pr esents methods for predicting page's content without downloading it, designs dif ferent controlling strategies, and defines several kinds of page downloading pri ority measures. An IG system, TH-Gatherer, was built to test the methods, and d ifferent experiments were carried out. Through experiments, it was found that th e content of candidate pages can be predicted approximately without downloading. When the priority based gathering strategy and hybrid measure are used, the gat hering efficiency is four times of that of BFS strategy which is used by many cu rrent IG tools (including crawlers and off-line browsing tools). The method pre sented in this paper is suitable for resource-bounded, specific topic informati on gathering.

    Reference
    Related
    Cited by
Get Citation

田范江,王曦东,王鼎兴.高效率WWW信息采集.软件学报,2001,12(1):33-40

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063