Abstract:Most of the existing peer-to-peer (P2P) systems only support simple title-based search, and users cannot search the data based on their content. Top-k query is widely used in the search engine and gains great success. However, Processing top-k query in pure P2P network is very challenging because a P2P system is a dynamic and decentralized system. An efficient hierarchical top-k query processing algorithm based on histogram is proposed. First, a distributed query processing model for top-k query is proposed. It does top-k query in a hierarchical way. Ranking and merging of documents are distributed across the peers, which takes full advantage of the computing resource of the network. Next, a histogram is constructed for each peer according to the top k results returned by the peer, and used to estimate the possible upper bound of the score for the peer. By the histogram information, the most possible peers are selected to send the query, so as to greatly improve the search efficiency. Experimental results show that the top-k query improves the query effectiveness, and the histogram improves the query efficiency.