基于最大间隙空间映射的高维数据索引技术
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60273079, 60473074, 60573089 (国家自然科学基金); the National Basic Research Program of China under Grant No. 2006CB303103 (国家重点基础研究发展计划(973)); the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.DP0345710 (国家教育部博士点基金)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [20]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    在基于高维索引技术的相似性查询处理中,通常通过过滤那些不包含任何查询结果的非活动子空间来不断缩减搜索空间.但是在活动子空间中,有些可能根本就不包含任何查询结果,这样的活动子空间被称为假活动子空间.显然,查询处理性能会随着假活动子空间访问次数的增加而下降.这一问题在高维数据情况下将会变得更加严重,实验显示出随着维数的增加,假活动子空间的访问次数也会增加.为了解决这一问题,提出了一种空间映射方法来减少这种不必要的访问.对于一个给定的查询,可以通过在映射空间内进一步精炼该查询来过滤假活动子空间.为了提高映射空间内查询精炼的处理效率,提出了一个最大间隙空间映射策略--MaxGapMapping.基于这种映射方法,设计并实现了一种新的索引结构--MS-tree,给出了索引的构建算法和范围查询处理算法.最后对MS-tree及其他索引结构的性能进行了详细的比较和分析.

    Abstract:

    In the similarity query processing based on high dimensional indexing, the searching space is usually narrowed down by pruning the inactive subspaces which do not contain any query results. However, among the active subspaces, some of them do not contain any query results at all, those are called false active subspaces. It is obvious that the performance of query processing degrades in the presence of false active subspaces. The problem becomes seriously in the case of high dimensional data. The experiment in this paper shows that the number of accesses to false active subspaces increases as the dimensionality increases. In order to overcome the problem, a space mapping approach is proposed to reduce such unnecessary accesses. For a given query, it can be refined by filtering within its mapped space. A maximal gap space mapping strategy, MaxGapMapping, is proposed to improve the efficiency of the refinement processing. An index structure——MS-tree, the algorithms of construction, and query processing based on this refining method are designed and implemented. Finally, the performance of MS-tree is systemically compared with that of other competitors in terms of range queries based on a real data set.

    参考文献
    [1]Bohm C,Berchtold S,Keim DA.Seaching in high-dimensional spaces.ACM Computing Surveys,2001,33(3):322-373.
    [2]Berkmann N,Krigel HP,Schneider R,Seeger B.The R*-tree:An efficient and robust access method for points and rectangles.SIGMOD Record,1990,19(2):322-331.
    [3]Katayama N,Satoh S.The SR-tree:An index structure for high-dimensional nearest neighbor queries.SIGMOD Record,1997,26(2):369-380.
    [4]White DA,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 20th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.
    [5]Lin KI,Jagadish HV,Faloutsos C.The TV-tree:An index structure for high-dimensional Data.VLDB Journal,1994,3(4):517-542.
    [6]Fu A,Chan P,Cheung Y,Moon Y.Dynamical VP-tree indexing for n-nearest neighbor search given pair-wise distances.VLDB Journal,2000,9(2):154-173.
    [7]Bozkaya T,Ozsoyoglu M.Distance-Based indexing for high-dimensional metric spaces.SIGMOD Record,1997,26(2):357-368.
    [8]Ciaccia P,Patella M,Zezula P.M-tree:An efficient access method for similarity search in metric spaces.In:Jarke M,Carey MJ,Dittrich KR,Lochovsky FH,Loucopoulos P,Jeusfeld MA,eds.Proc.of the 23rd Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,1997.426-435.
    [9]Cui B,Ooi BC,Su J,Tan K.Contorting high dimensional data for efficient main memory processing.In:Halevy AY,Ives ZG,Doan A,eds.Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.San Diego:ACM,2003.479-490.
    [10]Jr.Traina C,Traina A,Seeger B,Faloutsos C.Slim-trees:High performance metric trees minimizing overlap between nodes.In:Zaniolo C,Lockemann PC,Scholl MH,Grust T,eds.Proc.of the 7th Int'l Conf.on Extending Database Technology.Springer-Verlag,2000.51-65.
    [11]Zhou X,Wang G,Yu JX,Yu G.M+-tree:A new dynamical multidimensional index for metric spaces.In:Schewe KD,Zhou X,eds.Proc.of the 14th Australasian Database Conf.Australian Computer Society,2003.161-168.
    [12]Chakrabarti K,Mehrotra S.Local dimensionality reduction:A new approach to indexing high dimensional spaces.In:Abbadi AE,rodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Data Bases.San Fransisco:Morgan Kaufamann Publishers,2000.89-100.
    [13]Wang G,Lu H,Yu G,Bao Y.Managing very large document collections using semantics.Journal of Computer Science and Technology,2003,18(3):403-406.
    [14]Ciaccia P,Patella M.Bulk loading the M-tree.In:Roddick JF,ed.Proc.of the 9th Australasian Database Conf.Perth:Springer-Verlag,1998.15-26.
    [15]Zhou XM,Wang GR,Chang LZ,Fan D.Bulk-Loading M+-tree.Mini-Micro Systems,2006,27(2):295-299 (in Chinese with English abstract).
    [16]Feng Y,Cao K,Cao Z.A multidimensional index structure for fast similarity retrieval.Journal of Software,2002,13(8):1678-1685 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/13/1678.pdf
    [17]Dong D,Liu Z,Xue X.VA-trie:A new and efficient high dimensional index structure for approximate k nearest neighbor query.Journal of Computer Research and Development,2005,42(12):2213-2218 (in Chinese with English abstract).
    [15]周项敏,王国仁,常立喆,范丹.批量构建M+-tree.小型微型计算机系统,2006,27(2):295-299.
    [16]冯玉才,曹奎,曹忠升.一种支持快速相似性检索的多维索引结构.软件学报,2002,13(8):1678-1685.http://www.jos.org.cn/ 1000-9825/13/1678.pdf
    [17]董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构.计算机研究与发展,2005,42(12):2213-2218.
    相似文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王国仁,黄健美,王斌,韩东红,乔百友,于戈.基于最大间隙空间映射的高维数据索引技术.软件学报,2007,18(6):1419-1428

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2005-11-07
  • 最后修改日期:2006-08-16
文章二维码
您是第19819546位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号