A scalable locality sensitive hashing (SLSH) scheme is proposed to solve the problem of indexing high-dimensional data for dynamic datasets. The dynamic property destabilizes the size of the dataset, fuzzes up the tendency of data distribution, and conduces to the retrogression of retrieval performance. SLSH inherits two very convenient properties from the novel E2LSH that SLSH can rapidly work on data that is extremely high-dimensional and directly works on Euclidean space. For the purpose of adaptively fit the dynamic data distribution, the original hash family in E2LSH is altered for SLSH. A constraint of hash bucket capacity is applied for the hash parameters adjustment. As a result, SLSH provides robust partitions in the high-dimensional space for the dynamic data.
[1] Novak D, Batko M, Zezula P. Metric index:An efficient and scalable solution for precise and approximate similarity search. Information Systems, 2011,36(4):721-733.
[2] Jégou H, Schmid C, Harzallah H, Verbeek J. Accurate image search using the contextual dissimilarity measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010,32(1):2-11.
[3] Cai JJ, Liu Q, Chen F, Joshi D, Tian Q. Scalable image search with multiple index tables. In:Proc. of the Int'l Conf. on Multimedia Retrieval. 2014. 407.
[4] Zhu CZ, Satoh S. Large vocabulary quantization for searching instances from videos. In:Proc. of the 2nd ACM Int'l Conf. on Multimedia Retrieval. 2012. 52.
[5] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. VLDB, 1999,99:518-529.
[6] Indyk P. Nearest neighbors in high-dimensional spaces. 2004.
[7] Xia H, Wu PC, Hoi SCH, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In:Proc. of the 35th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. 2012. 55-64.
[8] Nolen M, Lin KI. Approximate high-dimensional nearest neighbor queries using R-forests. In:Proc. of the 17th Int'l Database Engineering & Applications Symp. 2013. 48-57.
[9] Indyk P, Motwani R. Approximate nearest neighbors:Towards removing the curse of dimensionality. In:Proc. of the 30th Annual ACM Symp. on Theory of Computing. 1998. 604-613.
[10] Paulevé L, Jégou H, Amsaleg L. Locality sensitive hashing:A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 2010,31(11):1348-1358.
[11] Deng C, Deng HR, Liu XL, Yuan Y. Adaptive multi-bit quantization for hashing. Neurocomputing, 2015,151:319-326.
[12] Jin ZM, Li C, Lin Y, Cai D. Density sensitive hashing. IEEE Trans. on Cybernetics, 2014,44(8):1362-1371.
[13] Gao JY, Jagadish HV, Lu W, Ooi BC. DSH:Data sensitive hashing for high-dimensional k-nnsearch. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 1127-1138.
[14] Weiss Y, Torralba A, Fergus R. Spectral hashing. In:Advances in Neural Information Processing Systems. 2009. 1753-1760.
[15] Bawa M, Condie T, Ganesan P. LSH forest:Self-Tuning indexes for similarity search. In:Proc. of the 14th Int'l Conf. on World Wide Web. 2005. 651-660.
[16] Jagadish HV, Ooi BC, Vu QH. Baton:A balanced tree structure for peer-to-peer networks. In:Proc. of the 31st Int'l Conf. on Very Large Data Bases. 2005. 661-672.
[17] Larson P. Dynamic hashing. BIT Numerical Mathematics, 1978,18(2):184-201.
[18] Andoni A, Indyk P. Near-Optimal hashing algorithms for approximate nearest neighbor in high dimensions. In:Proc. of the 47th IEEE Symp. on Foundations of Computer Science. 2006. 459-468.
[19] Datar M, Immorlica N, Indyk P, Mirrokni VS. Locality-Sensitive hashing scheme based on p-stable distributions. In:Proc. of the 20th Annual Symp. on Computational Geometry. 2004. 253-262.
[20] Andoni A. Nearest neighbor search:The old, the new, and the impossible. Massachusetts Institute of Technology, 2009.
[21] Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. Journal of the ACM, 2006,53(3):307-323.
[22] Robinson JT. The KDB-tree:A search structure for large multidimensional dynamic indexes. In:Proc. of the 1981 ACM SIGMOD Int'l Conf. on Management of Data. 1981. 10-18.