基于最大间隙空间映射的高维数据索引技术
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60273079, 60473074, 60573089 (国家自然科学基金); the National Basic Research Program of China under Grant No. 2006CB303103 (国家重点基础研究发展计划(973)); the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.DP0345710 (国家教育部博士点基金)


A High Dimensional Data Indexing Technique Based on Max Gap Space Mapping
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在基于高维索引技术的相似性查询处理中,通常通过过滤那些不包含任何查询结果的非活动子空间来不断缩减搜索空间.但是在活动子空间中,有些可能根本就不包含任何查询结果,这样的活动子空间被称为假活动子空间.显然,查询处理性能会随着假活动子空间访问次数的增加而下降.这一问题在高维数据情况下将会变得更加严重,实验显示出随着维数的增加,假活动子空间的访问次数也会增加.为了解决这一问题,提出了一种空间映射方法来减少这种不必要的访问.对于一个给定的查询,可以通过在映射空间内进一步精炼该查询来过滤假活动子空间.为了提高映射空间内查询精炼的处理效率,提出了一个最大间隙空间映射策略--MaxGapMapping.基于这种映射方法,设计并实现了一种新的索引结构--MS-tree,给出了索引的构建算法和范围查询处理算法.最后对MS-tree及其他索引结构的性能进行了详细的比较和分析.

    Abstract:

    In the similarity query processing based on high dimensional indexing, the searching space is usually narrowed down by pruning the inactive subspaces which do not contain any query results. However, among the active subspaces, some of them do not contain any query results at all, those are called false active subspaces. It is obvious that the performance of query processing degrades in the presence of false active subspaces. The problem becomes seriously in the case of high dimensional data. The experiment in this paper shows that the number of accesses to false active subspaces increases as the dimensionality increases. In order to overcome the problem, a space mapping approach is proposed to reduce such unnecessary accesses. For a given query, it can be refined by filtering within its mapped space. A maximal gap space mapping strategy, MaxGapMapping, is proposed to improve the efficiency of the refinement processing. An index structure——MS-tree, the algorithms of construction, and query processing based on this refining method are designed and implemented. Finally, the performance of MS-tree is systemically compared with that of other competitors in terms of range queries based on a real data set.

    参考文献
    相似文献
    引证文献
引用本文

王国仁,黄健美,王斌,韩东红,乔百友,于戈.基于最大间隙空间映射的高维数据索引技术.软件学报,2007,18(6):1419-1428

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2005-11-07
  • 最后修改日期:2006-08-16
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号