基于可扩展LSH的高维动态数据索引
作者:
基金项目:

国家自然科学基金(61370121);国家高技术研究发展计划(863)(2014AA015102)


Scalable Locality Sensitive Hashing Scheme for Dynamic High-Dimensional Data Indexing
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [22]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    提出了一种可扩展的局部敏感哈希索引(SLSH),以解决高维动态数据索引中,由于数据集大小及分布特征无法确定而导致索引效率降低的问题.SLSH架构于E2LSH之上,继承了其对高维数据索引速度快,并可直接对欧式空间上的数据点进行索引的特点.为了使得哈希索引具有动态的相似性区分能力,SLSH修改了E2LSH的哈希族,通过哈希桶容量约束自适应调节哈希参数.因此对于分布密度动态变化的数据空间,SLSH也能够给出鲁棒的划分.

    Abstract:

    A scalable locality sensitive hashing (SLSH) scheme is proposed to solve the problem of indexing high-dimensional data for dynamic datasets. The dynamic property destabilizes the size of the dataset, fuzzes up the tendency of data distribution, and conduces to the retrogression of retrieval performance. SLSH inherits two very convenient properties from the novel E2LSH that SLSH can rapidly work on data that is extremely high-dimensional and directly works on Euclidean space. For the purpose of adaptively fit the dynamic data distribution, the original hash family in E2LSH is altered for SLSH. A constraint of hash bucket capacity is applied for the hash parameters adjustment. As a result, SLSH provides robust partitions in the high-dimensional space for the dynamic data.

    参考文献
    [1] Novak D, Batko M, Zezula P. Metric index:An efficient and scalable solution for precise and approximate similarity search. Information Systems, 2011,36(4):721-733.
    [2] Jégou H, Schmid C, Harzallah H, Verbeek J. Accurate image search using the contextual dissimilarity measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010,32(1):2-11.
    [3] Cai JJ, Liu Q, Chen F, Joshi D, Tian Q. Scalable image search with multiple index tables. In:Proc. of the Int'l Conf. on Multimedia Retrieval. 2014. 407.
    [4] Zhu CZ, Satoh S. Large vocabulary quantization for searching instances from videos. In:Proc. of the 2nd ACM Int'l Conf. on Multimedia Retrieval. 2012. 52.
    [5] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. VLDB, 1999,99:518-529.
    [6] Indyk P. Nearest neighbors in high-dimensional spaces. 2004.
    [7] Xia H, Wu PC, Hoi SCH, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In:Proc. of the 35th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. 2012. 55-64.
    [8] Nolen M, Lin KI. Approximate high-dimensional nearest neighbor queries using R-forests. In:Proc. of the 17th Int'l Database Engineering & Applications Symp. 2013. 48-57.
    [9] Indyk P, Motwani R. Approximate nearest neighbors:Towards removing the curse of dimensionality. In:Proc. of the 30th Annual ACM Symp. on Theory of Computing. 1998. 604-613.
    [10] Paulevé L, Jégou H, Amsaleg L. Locality sensitive hashing:A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 2010,31(11):1348-1358.
    [11] Deng C, Deng HR, Liu XL, Yuan Y. Adaptive multi-bit quantization for hashing. Neurocomputing, 2015,151:319-326.
    [12] Jin ZM, Li C, Lin Y, Cai D. Density sensitive hashing. IEEE Trans. on Cybernetics, 2014,44(8):1362-1371.
    [13] Gao JY, Jagadish HV, Lu W, Ooi BC. DSH:Data sensitive hashing for high-dimensional k-nnsearch. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 1127-1138.
    [14] Weiss Y, Torralba A, Fergus R. Spectral hashing. In:Advances in Neural Information Processing Systems. 2009. 1753-1760.
    [15] Bawa M, Condie T, Ganesan P. LSH forest:Self-Tuning indexes for similarity search. In:Proc. of the 14th Int'l Conf. on World Wide Web. 2005. 651-660.
    [16] Jagadish HV, Ooi BC, Vu QH. Baton:A balanced tree structure for peer-to-peer networks. In:Proc. of the 31st Int'l Conf. on Very Large Data Bases. 2005. 661-672.
    [17] Larson P. Dynamic hashing. BIT Numerical Mathematics, 1978,18(2):184-201.
    [18] Andoni A, Indyk P. Near-Optimal hashing algorithms for approximate nearest neighbor in high dimensions. In:Proc. of the 47th IEEE Symp. on Foundations of Computer Science. 2006. 459-468.
    [19] Datar M, Immorlica N, Indyk P, Mirrokni VS. Locality-Sensitive hashing scheme based on p-stable distributions. In:Proc. of the 20th Annual Symp. on Computational Geometry. 2004. 253-262.
    [20] Andoni A. Nearest neighbor search:The old, the new, and the impossible. Massachusetts Institute of Technology, 2009.
    [21] Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. Journal of the ACM, 2006,53(3):307-323.
    [22] Robinson JT. The KDB-tree:A search structure for large multidimensional dynamic indexes. In:Proc. of the 1981 ACM SIGMOD Int'l Conf. on Management of Data. 1981. 10-18.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

胡海苗,姜帆.基于可扩展LSH的高维动态数据索引.软件学报,2015,26(S2):228-238

复制
分享
文章指标
  • 点击次数:2469
  • 下载次数: 4829
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2015-05-15
  • 最后修改日期:2015-10-12
  • 在线发布日期: 2016-01-11
文章二维码
您是第19863458位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号