Collecting and Storing Web Archive Based on Page Block
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In this paper, the page block based Web archive collecting and storing approach is proposed. The algorithms of layout-based page partition, extracting topic from block, version comparison and incremental storage implementation are introduced in detail. The prototype system is implemented and tested to verify the proposed approach. Theoretics and experiments show that, the proposed approach adapts the Web archive management well, and provides a valuable data resource to the Web archive based query, search, data mining and knowledge discovering applications.

    Reference
    Related
    Cited by
Get Citation

宋 杰,王大玲,鲍玉斌,申德荣.基于页面Block的Web档案采集和存储.软件学报,2008,19(2):275-290

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 31,2007
  • Revised:October 19,2007
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063