Collecting and Storing Web Archive Based on Page Block

微信服务号

微信订阅号

2025-5-18- 0

Home > Archive>Volume 19, Issue 2, 2008 >275-290

Collecting and Storing Web Archive Based on Page Block
DOI:
                        
                    
Author:
                        SONG JieSONG Jie

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Da-LingWANG Da-Ling

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAO Yu-BinBAO Yu-Bin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN De-RongSHEN De-Rong

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In this paper, the page block based Web archive collecting and storing approach is proposed. The algorithms of layout-based page partition, extracting topic from block, version comparison and incremental storage implementation are introduced in detail. The prototype system is implemented and tested to verify the proposed approach. Theoretics and experiments show that, the proposed approach adapts the Web archive management well, and provides a valuable data resource to the Web archive based query, search, data mining and knowledge discovering applications.

Key words:Web archive; page partition, page block

Get Citation

宋杰,王大玲,鲍玉斌,申德荣.基于页面Block的Web档案采集和存储.软件学报,2008,19(2):275-290

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 31,2007
Revised:October 19,2007
Adopted:
Online:
Published:

You are the first2045305Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History