基于可扩展LSH的高维动态数据索引

微信服务号

微信订阅号

2025年4月24日 5:58 星期四

首页 > 过刊浏览>2015年第26卷第S2期 >228-238

PDF HTML阅读 XML下载导出引用引用提醒

基于可扩展LSH的高维动态数据索引
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        胡海苗胡海苗
数字媒体北京市重点实验室(北京航空航天大学), 北京 100191
在期刊界中查找
在百度中查找
在本站中查找
姜帆姜帆
数字媒体北京市重点实验室(北京航空航天大学), 北京 100191
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61370121);国家高技术研究发展计划(863)(2014AA015102)

Scalable Locality Sensitive Hashing Scheme for Dynamic High-Dimensional Data Indexing

Author:

HU Hai-Miao
HU Hai-Miao
Beijing Key Laboratory of Digital Media(BeiHang University), Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找
JIANG Fan
JIANG Fan
Beijing Key Laboratory of Digital Media(BeiHang University), Beijing 100191, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [22]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

提出了一种可扩展的局部敏感哈希索引(SLSH),以解决高维动态数据索引中,由于数据集大小及分布特征无法确定而导致索引效率降低的问题.SLSH架构于E2LSH之上,继承了其对高维数据索引速度快,并可直接对欧式空间上的数据点进行索引的特点.为了使得哈希索引具有动态的相似性区分能力,SLSH修改了E2LSH的哈希族,通过哈希桶容量约束自适应调节哈希参数.因此对于分布密度动态变化的数据空间,SLSH也能够给出鲁棒的划分.

关键词:相似性检索;近似最近邻搜索;可扩展局部敏感哈希;动态高维数据索引

Abstract:

A scalable locality sensitive hashing (SLSH) scheme is proposed to solve the problem of indexing high-dimensional data for dynamic datasets. The dynamic property destabilizes the size of the dataset, fuzzes up the tendency of data distribution, and conduces to the retrogression of retrieval performance. SLSH inherits two very convenient properties from the novel E2LSH that SLSH can rapidly work on data that is extremely high-dimensional and directly works on Euclidean space. For the purpose of adaptively fit the dynamic data distribution, the original hash family in E2LSH is altered for SLSH. A constraint of hash bucket capacity is applied for the hash parameters adjustment. As a result, SLSH provides robust partitions in the high-dimensional space for the dynamic data.

参考文献

[1] Novak D, Batko M, Zezula P. Metric index:An efficient and scalable solution for precise and approximate similarity search. Information Systems, 2011,36(4):721-733.

[2] Jégou H, Schmid C, Harzallah H, Verbeek J. Accurate image search using the contextual dissimilarity measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010,32(1):2-11.

[3] Cai JJ, Liu Q, Chen F, Joshi D, Tian Q. Scalable image search with multiple index tables. In:Proc. of the Int'l Conf. on Multimedia Retrieval. 2014. 407.

[4] Zhu CZ, Satoh S. Large vocabulary quantization for searching instances from videos. In:Proc. of the 2nd ACM Int'l Conf. on Multimedia Retrieval. 2012. 52.

[5] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. VLDB, 1999,99:518-529.

[6] Indyk P. Nearest neighbors in high-dimensional spaces. 2004.

[7] Xia H, Wu PC, Hoi SCH, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In:Proc. of the 35th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. 2012. 55-64.

[8] Nolen M, Lin KI. Approximate high-dimensional nearest neighbor queries using R-forests. In:Proc. of the 17th Int'l Database Engineering & Applications Symp. 2013. 48-57.

[9] Indyk P, Motwani R. Approximate nearest neighbors:Towards removing the curse of dimensionality. In:Proc. of the 30th Annual ACM Symp. on Theory of Computing. 1998. 604-613.

[10] Paulevé L, Jégou H, Amsaleg L. Locality sensitive hashing:A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 2010,31(11):1348-1358.

[11] Deng C, Deng HR, Liu XL, Yuan Y. Adaptive multi-bit quantization for hashing. Neurocomputing, 2015,151:319-326.

[12] Jin ZM, Li C, Lin Y, Cai D. Density sensitive hashing. IEEE Trans. on Cybernetics, 2014,44(8):1362-1371.

[13] Gao JY, Jagadish HV, Lu W, Ooi BC. DSH:Data sensitive hashing for high-dimensional k-nnsearch. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 1127-1138.

[14] Weiss Y, Torralba A, Fergus R. Spectral hashing. In:Advances in Neural Information Processing Systems. 2009. 1753-1760.

[15] Bawa M, Condie T, Ganesan P. LSH forest:Self-Tuning indexes for similarity search. In:Proc. of the 14th Int'l Conf. on World Wide Web. 2005. 651-660.

[16] Jagadish HV, Ooi BC, Vu QH. Baton:A balanced tree structure for peer-to-peer networks. In:Proc. of the 31st Int'l Conf. on Very Large Data Bases. 2005. 661-672.

[17] Larson P. Dynamic hashing. BIT Numerical Mathematics, 1978,18(2):184-201.

[18] Andoni A, Indyk P. Near-Optimal hashing algorithms for approximate nearest neighbor in high dimensions. In:Proc. of the 47th IEEE Symp. on Foundations of Computer Science. 2006. 459-468.

[19] Datar M, Immorlica N, Indyk P, Mirrokni VS. Locality-Sensitive hashing scheme based on p-stable distributions. In:Proc. of the 20th Annual Symp. on Computational Geometry. 2004. 253-262.

[20] Andoni A. Nearest neighbor search:The old, the new, and the impossible. Massachusetts Institute of Technology, 2009.

[21] Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. Journal of the ACM, 2006,53(3):307-323.

[22] Robinson JT. The KDB-tree:A search structure for large multidimensional dynamic indexes. In:Proc. of the 1981 ACM SIGMOD Int'l Conf. on Management of Data. 1981. 10-18.

引用本文

胡海苗,姜帆.基于可扩展LSH的高维动态数据索引.软件学报,2015,26(S2):228-238

复制

文章指标

点击次数:2469
下载次数: 4829
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2015-05-15
最后修改日期:2015-10-12
录用日期:
在线发布日期: 2016-01-11
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码