2005, 16(4):540-552.
Abstract:Most of the existing peer-to-peer (P2P) systems only support simple title-based search, and users cannot search the data based on their content. Top-k query is widely used in the search engine and gains great success. However, Processing top-k query in pure P2P network is very challenging because a P2P system is a dynamic and decentralized system. An efficient hierarchical top-k query processing algorithm based on histogram is proposed. First, a distributed query processing model for top-k query is proposed. It does top-k query in a hierarchical way. Ranking and merging of documents are distributed across the peers, which takes full advantage of the computing resource of the network. Next, a histogram is constructed for each peer according to the top k results returned by the peer, and used to estimate the possible upper bound of the score for the peer. By the histogram information, the most possible peers are selected to send the query, so as to greatly improve the search efficiency. Experimental results show that the top-k query improves the query effectiveness, and the histogram improves the query efficiency.
2005, 16(7):1270-1281.
Abstract:Traditionally, SQL is the main interface to access data from relational databases. However, it is difficult for inexperienced end users to learn the complicated syntax of SQL. Enabling keyword-based information retrieval over relational databases will allow users to acquire information from databases without any knowledge of SQL and the underlying database schema, just like the way of common search engines. This paper describes the design and implementation of SEEKER, a system supporting keyword-based information retrieval over relational databases. While there have been some existing systems that support searching text attributes in relational databases, SEEKER can also search database metadata and numeric attributes. Moreover, SEEKER employs an improved ranking function and supports Top-k queries. Experimental results show that SEEKER can achieve good retrieval performance.