王 辉,刘艳威,左万利.使用分类器自动发现特定领域的深度网入口.软件学报,2008,19(2):246-256 |
使用分类器自动发现特定领域的深度网入口 |
Using Classifiers to Find Domain-Specific Online Databases Automatically |
投稿时间:2007-08-02 修订日期:2007-11-06 |
DOI: |
中文关键词: 深度网 深度网 表层网 深度网入口 搜索表单 |
英文关键词:deep Web hidden Web surface Web hidden Web entry searchable form |
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60373099 (国家自然科学基金); the Science and Technology Development Program of Jilin Province of China under Grant No.20070533 (吉林省科技发展计划) |
|
摘要点击次数: 7279 |
全文下载次数: 5009 |
中文摘要: |
在深度网研究领域,通用搜索引擎(比如Google和Yahoo)具有许多不足之处:它们各自所能覆盖的数据量与整个深度网数据总量的比值小于1/3;与表层网中的情况不同,几个搜索引擎相结合所能覆盖的数据量基本没有发生变化.许多深度网站点能够提供大量高质量的信息,并且,深度网正在逐渐成为一个最重要的信息资源.提出了一个三分类器的框架,用于自动识别特定领域的深度网入口.查询接口得到以后,可以将它们进行集成,然后将一个统一的接口提交给用户以方便他们查询信息.通过8组大规模的实验,验证了所提出的方法可以准确高效地发现特定领域的深度网入口. |
英文摘要: |
In hidden Web domain, general-purpose search engines (i.e., Google and Yahoo) have their shortcomings. They cover less than one-third of the data stored in document databases. Unlike the surface Web, if combined, they cover roughly the same data. Hidden Web is a highly important information source since the content provided by many hidden Web sites is often of very high quality. This paper proposes a three-step framework to automatically identify domain-specific hidden Web entries. With those obtained query interfaces, they can be integrated to obtain a unified interface which is given to users to query. Eight large-scale experiments demonstrate that the technique can find domain-specific hidden Web entries accurately and efficiently. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |