Abstract:In order to improve the query efficiency, K-means cluster approach is often used to estimate the data distribution in the context of high dimensional metric space index. But in previous work, the parameters of clustering are usually selected according to some heuristic manner. This paper presents a new high dimensional index approach—cluster splitting based high dimensional B+-tree. Through cluster splitting, the data space is partitioned more finely to reduce the cost of data access. The relationship between cluster and the query cost is discussed, and based on the query cost model, this paper give formulas to compute the "optimal" parameters of the cluster which can minimize the query cost in theory. Experiment results show that the efficiency of the methods is better than iDistance, M-Tree and sequence scan, and the parameters computed by the formulas are very close to the real optimal one.