A High Dimensional Data Indexing Technique Based on Max Gap Space Mapping

微信服务号

微信订阅号

2025-4-7- 3

Home > Archive>Volume 18, Issue 6, 2007 >1419-1428

A High Dimensional Data Indexing Technique Based on Max Gap Space Mapping
DOI:
                        
                    
Author:
                        WANG Guo-RenWANG Guo-Ren

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG Jian-MeiHUANG Jian-Mei

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG BinWANG Bin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HAN Dong-HongHAN Dong-Hong

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
QIAO Bai-YouQIAO Bai-You

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU GeYU Ge

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [20]

Related [20]

Cited by [3]

Materials

Comments

Abstract:

In the similarity query processing based on high dimensional indexing, the searching space is usually narrowed down by pruning the inactive subspaces which do not contain any query results. However, among the active subspaces, some of them do not contain any query results at all, those are called false active subspaces. It is obvious that the performance of query processing degrades in the presence of false active subspaces. The problem becomes seriously in the case of high dimensional data. The experiment in this paper shows that the number of accesses to false active subspaces increases as the dimensionality increases. In order to overcome the problem, a space mapping approach is proposed to reduce such unnecessary accesses. For a given query, it can be refined by filtering within its mapped space. A maximal gap space mapping strategy, MaxGapMapping, is proposed to improve the efficiency of the refinement processing. An index structure——MS-tree, the algorithms of construction, and query processing based on this refining method are designed and implemented. Finally, the performance of MS-tree is systemically compared with that of other competitors in terms of range queries based on a real data set.

Key words:high dimensional index; query refining; false active subspace

Reference

[1]Bohm C,Berchtold S,Keim DA.Seaching in high-dimensional spaces.ACM Computing Surveys,2001,33(3):322-373.

[2]Berkmann N,Krigel HP,Schneider R,Seeger B.The R*-tree:An efficient and robust access method for points and rectangles.SIGMOD Record,1990,19(2):322-331.

[3]Katayama N,Satoh S.The SR-tree:An index structure for high-dimensional nearest neighbor queries.SIGMOD Record,1997,26(2):369-380.

[4]White DA,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 20th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.

[5]Lin KI,Jagadish HV,Faloutsos C.The TV-tree:An index structure for high-dimensional Data.VLDB Journal,1994,3(4):517-542.

[6]Fu A,Chan P,Cheung Y,Moon Y.Dynamical VP-tree indexing for n-nearest neighbor search given pair-wise distances.VLDB Journal,2000,9(2):154-173.

[7]Bozkaya T,Ozsoyoglu M.Distance-Based indexing for high-dimensional metric spaces.SIGMOD Record,1997,26(2):357-368.

[8]Ciaccia P,Patella M,Zezula P.M-tree:An efficient access method for similarity search in metric spaces.In:Jarke M,Carey MJ,Dittrich KR,Lochovsky FH,Loucopoulos P,Jeusfeld MA,eds.Proc.of the 23rd Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,1997.426-435.

[9]Cui B,Ooi BC,Su J,Tan K.Contorting high dimensional data for efficient main memory processing.In:Halevy AY,Ives ZG,Doan A,eds.Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.San Diego:ACM,2003.479-490.

[10]Jr.Traina C,Traina A,Seeger B,Faloutsos C.Slim-trees:High performance metric trees minimizing overlap between nodes.In:Zaniolo C,Lockemann PC,Scholl MH,Grust T,eds.Proc.of the 7th Int'l Conf.on Extending Database Technology.Springer-Verlag,2000.51-65.

[11]Zhou X,Wang G,Yu JX,Yu G.M+-tree:A new dynamical multidimensional index for metric spaces.In:Schewe KD,Zhou X,eds.Proc.of the 14th Australasian Database Conf.Australian Computer Society,2003.161-168.

[12]Chakrabarti K,Mehrotra S.Local dimensionality reduction:A new approach to indexing high dimensional spaces.In:Abbadi AE,rodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,2000.89-100.

[13]Wang G,Lu H,Yu G,Bao Y.Managing very large document collections using semantics.Journal of Computer Science and Technology,2003,18(3):403-406.

[14]Ciaccia P,Patella M.Bulk loading the M-tree.In:Roddick JF,ed.Proc.of the 9th Australasian Database Conf.Perth:Springer-Verlag,1998.15-26.

[15]Zhou XM,Wang GR,Chang LZ,Fan D.Bulk-Loading M+-tree.Mini-Micro Systems,2006,27(2):295-299 (in Chinese with English abstract).

[16]Feng Y,Cao K,Cao Z.A multidimensional index structure for fast similarity retrieval.Journal of Software,2002,13(8):1678-1685 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/13/1678.pdf

[17]Dong D,Liu Z,Xue X.VA-trie:A new and efficient high dimensional index structure for approximate k nearest neighbor query.Journal of Computer Research and Development,2005,42(12):2213-2218 (in Chinese with English abstract).

[15]周项敏,王国仁,常立喆,范丹.批量构建M+-tree.小型微型计算机系统,2006,27(2):295-299.

[16]冯玉才,曹奎,曹忠升.一种支持快速相似性检索的多维索引结构.软件学报,2002,13(8):1678-1685.http://www.jos.org.cn/ 1000-9825/13/1678.pdf

[17]董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构.计算机研究与发展,2005,42(12):2213-2218.

Get Citation

王国仁,黄健美,王斌,韩东红,乔百友,于戈.基于最大间隙空间映射的高维数据索引技术.软件学报,2007,18(6):1419-1428

Copy

Article Metrics

Abstract:4692
PDF: 6071
HTML: 0
Cited by: 0

History

Received:November 07,2005
Revised:August 16,2006
Adopted:
Online:
Published:

You are the first2033450Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History