基于最大间隙空间映射的高维数据索引技术

微信服务号

微信订阅号

2025年5月17日 5:29 星期六

首页 > 过刊浏览>2007年第18卷第6期 >1419-1428

基于最大间隙空间映射的高维数据索引技术
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        王国仁王国仁
东北大学,信息科学与工程学院,辽宁,沈阳,110004;东北大学,计算中心(网络中心),辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找
黄健美黄健美
东北大学,信息科学与工程学院,辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找
王斌王斌
东北大学,信息科学与工程学院,辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找
韩东红韩东红
东北大学,信息科学与工程学院,辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找
乔百友乔百友
东北大学,信息科学与工程学院,辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找
于戈于戈
东北大学,信息科学与工程学院,辽宁,沈阳,110004
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60273079, 60473074, 60573089 (国家自然科学基金); the National Basic Research Program of China under Grant No. 2006CB303103 (国家重点基础研究发展计划(973)); the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.DP0345710 (国家教育部博士点基金)

A High Dimensional Data Indexing Technique Based on Max Gap Space Mapping

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [20]

相似文献

引证文献

资源附件

文章评论

摘要:

在基于高维索引技术的相似性查询处理中,通常通过过滤那些不包含任何查询结果的非活动子空间来不断缩减搜索空间.但是在活动子空间中,有些可能根本就不包含任何查询结果,这样的活动子空间被称为假活动子空间.显然,查询处理性能会随着假活动子空间访问次数的增加而下降.这一问题在高维数据情况下将会变得更加严重,实验显示出随着维数的增加,假活动子空间的访问次数也会增加.为了解决这一问题,提出了一种空间映射方法来减少这种不必要的访问.对于一个给定的查询,可以通过在映射空间内进一步精炼该查询来过滤假活动子空间.为了提高映射空间内查询精炼的处理效率,提出了一个最大间隙空间映射策略--MaxGapMapping.基于这种映射方法,设计并实现了一种新的索引结构--MS-tree,给出了索引的构建算法和范围查询处理算法.最后对MS-tree及其他索引结构的性能进行了详细的比较和分析.

关键词:高维索引;查询精炼;假活动子空间

Abstract:

In the similarity query processing based on high dimensional indexing, the searching space is usually narrowed down by pruning the inactive subspaces which do not contain any query results. However, among the active subspaces, some of them do not contain any query results at all, those are called false active subspaces. It is obvious that the performance of query processing degrades in the presence of false active subspaces. The problem becomes seriously in the case of high dimensional data. The experiment in this paper shows that the number of accesses to false active subspaces increases as the dimensionality increases. In order to overcome the problem, a space mapping approach is proposed to reduce such unnecessary accesses. For a given query, it can be refined by filtering within its mapped space. A maximal gap space mapping strategy, MaxGapMapping, is proposed to improve the efficiency of the refinement processing. An index structure——MS-tree, the algorithms of construction, and query processing based on this refining method are designed and implemented. Finally, the performance of MS-tree is systemically compared with that of other competitors in terms of range queries based on a real data set.

Key words:high dimensional index; query refining; false active subspace

参考文献

[1]Bohm C,Berchtold S,Keim DA.Seaching in high-dimensional spaces.ACM Computing Surveys,2001,33(3):322-373.

[2]Berkmann N,Krigel HP,Schneider R,Seeger B.The R*-tree:An efficient and robust access method for points and rectangles.SIGMOD Record,1990,19(2):322-331.

[3]Katayama N,Satoh S.The SR-tree:An index structure for high-dimensional nearest neighbor queries.SIGMOD Record,1997,26(2):369-380.

[4]White DA,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 20th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.

[5]Lin KI,Jagadish HV,Faloutsos C.The TV-tree:An index structure for high-dimensional Data.VLDB Journal,1994,3(4):517-542.

[6]Fu A,Chan P,Cheung Y,Moon Y.Dynamical VP-tree indexing for n-nearest neighbor search given pair-wise distances.VLDB Journal,2000,9(2):154-173.

[7]Bozkaya T,Ozsoyoglu M.Distance-Based indexing for high-dimensional metric spaces.SIGMOD Record,1997,26(2):357-368.

[8]Ciaccia P,Patella M,Zezula P.M-tree:An efficient access method for similarity search in metric spaces.In:Jarke M,Carey MJ,Dittrich KR,Lochovsky FH,Loucopoulos P,Jeusfeld MA,eds.Proc.of the 23rd Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,1997.426-435.

[9]Cui B,Ooi BC,Su J,Tan K.Contorting high dimensional data for efficient main memory processing.In:Halevy AY,Ives ZG,Doan A,eds.Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.San Diego:ACM,2003.479-490.

[10]Jr.Traina C,Traina A,Seeger B,Faloutsos C.Slim-trees:High performance metric trees minimizing overlap between nodes.In:Zaniolo C,Lockemann PC,Scholl MH,Grust T,eds.Proc.of the 7th Int'l Conf.on Extending Database Technology.Springer-Verlag,2000.51-65.

[11]Zhou X,Wang G,Yu JX,Yu G.M+-tree:A new dynamical multidimensional index for metric spaces.In:Schewe KD,Zhou X,eds.Proc.of the 14th Australasian Database Conf.Australian Computer Society,2003.161-168.

[12]Chakrabarti K,Mehrotra S.Local dimensionality reduction:A new approach to indexing high dimensional spaces.In:Abbadi AE,rodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,2000.89-100.

[13]Wang G,Lu H,Yu G,Bao Y.Managing very large document collections using semantics.Journal of Computer Science and Technology,2003,18(3):403-406.

[14]Ciaccia P,Patella M.Bulk loading the M-tree.In:Roddick JF,ed.Proc.of the 9th Australasian Database Conf.Perth:Springer-Verlag,1998.15-26.

[15]Zhou XM,Wang GR,Chang LZ,Fan D.Bulk-Loading M+-tree.Mini-Micro Systems,2006,27(2):295-299 (in Chinese with English abstract).

[16]Feng Y,Cao K,Cao Z.A multidimensional index structure for fast similarity retrieval.Journal of Software,2002,13(8):1678-1685 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/13/1678.pdf

[17]Dong D,Liu Z,Xue X.VA-trie:A new and efficient high dimensional index structure for approximate k nearest neighbor query.Journal of Computer Research and Development,2005,42(12):2213-2218 (in Chinese with English abstract).

[15]周项敏,王国仁,常立喆,范丹.批量构建M+-tree.小型微型计算机系统,2006,27(2):295-299.

[16]冯玉才,曹奎,曹忠升.一种支持快速相似性检索的多维索引结构.软件学报,2002,13(8):1678-1685.http://www.jos.org.cn/ 1000-9825/13/1678.pdf

[17]董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构.计算机研究与发展,2005,42(12):2213-2218.

引用本文

王国仁,黄健美,王斌,韩东红,乔百友,于戈.基于最大间隙空间映射的高维数据索引技术.软件学报,2007,18(6):1419-1428

复制

文章指标

点击次数:4702
下载次数: 6122
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2005-11-07
最后修改日期:2006-08-16
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码