Abstract:In hidden Web domain, general-purpose search engines (i.e., Google and Yahoo) have their shortcomings. They cover less than one-third of the data stored in document databases. Unlike the surface Web, if combined, they cover roughly the same data. Hidden Web is a highly important information source since the content provided by many hidden Web sites is often of very high quality. This paper proposes a three-step framework to automatically identify domain-specific hidden Web entries. With those obtained query interfaces, they can be integrated to obtain a unified interface which is given to users to query. Eight large-scale experiments demonstrate that the technique can find domain-specific hidden Web entries accurately and efficiently.