使用分类器自动发现特定领域的深度网入口

微信服务号

微信订阅号

首页 > 过刊浏览>2008年第19卷第2期 >246-256

使用分类器自动发现特定领域的深度网入口
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60373099 (国家自然科学基金); the Science and Technology Development Program of Jilin Province of China under Grant No.20070533 (吉林省科技发展计划)

Using Classifiers to Find Domain-Specific Online Databases Automatically

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在深度网研究领域,通用搜索引擎(比如Google和Yahoo)具有许多不足之处:它们各自所能覆盖的数据量与整个深度网数据总量的比值小于1/3;与表层网中的情况不同,几个搜索引擎相结合所能覆盖的数据量基本没有发生变化.许多深度网站点能够提供大量高质量的信息,并且,深度网正在逐渐成为一个最重要的信息资源.提出了一个三分类器的框架,用于自动识别特定领域的深度网入口.查询接口得到以后,可以将它们进行集成,然后将一个统一的接口提交给用户以方便他们查询信息.通过8组大规模的实验,验证了所提出的方法可以准确高效地发现特定领域的深度网入口.

Abstract:

In hidden Web domain, general-purpose search engines (i.e., Google and Yahoo) have their shortcomings. They cover less than one-third of the data stored in document databases. Unlike the surface Web, if combined, they cover roughly the same data. Hidden Web is a highly important information source since the content provided by many hidden Web sites is often of very high quality. This paper proposes a three-step framework to automatically identify domain-specific hidden Web entries. With those obtained query interfaces, they can be integrated to obtain a unified interface which is given to users to query. Eight large-scale experiments demonstrate that the technique can find domain-specific hidden Web entries accurately and efficiently.

参考文献

相似文献

引证文献

引用本文

王辉,刘艳威,左万利.使用分类器自动发现特定领域的深度网入口.软件学报,2008,19(2):246-256

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2007-08-02
最后修改日期:2007-11-06
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史