Open Environmental Locality-sensitive Hashing Retrieval for Multiple Distributed Characteristics
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The retrieval methods based-on locality-sensitive hashing (LSH) provide a feasible solution to the problem of approximate nearest neighbor (ANN) search on high-dimensional, multiple distributed characteristics, and massive data. However, there are still some unresolved problems in open environment, such as poor adaptability to the data with multiple distribution characteristics. Based on the fact that Laplacian operator is sensitive to sharp changes in data, an LSH retrieval method based on Laplacian operator (LPLSH) is proposed, which is suitable for data in open environment with a variety of distributed characteristics, and can segment data on global view. By applying Laplacian operator to the probability density distribution of data projection, the position of the sharp change of distribution will found as the offset of the hyperplane. This study proves theoretically that the reduced dimension can keep the local sensitivity characteristics of the hash function, and the global low projection density interval segmentation is helpful to improve the precision. The guiding significance of using Laplacian operator to obtain the second derivative to set the hyperplane offset is also analyzed. Compared with the other 8 methods based on LSH, the F1 value of LPLSH is 0.8-5 times of the optimal value of other methods, and it takes less time. Through the analysis of the distribution characteristics of experimental datasets, the experimental results show that LPLSH can take into account the efficiency, accuracy, and recall rate at the same time, can meet the robustness requirements of large-scale high- dimensional retrieval with multi-distribution characteristics in open environment.

    Reference
    Related
    Cited by
Get Citation

张仕,赖会霞,肖如良,潘淼鑫,张路路,陈伟林.开放环境多分布特性的局部敏感哈希检索方法.软件学报,2022,33(4):1200-1217

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:February 18,2021
  • Revised:July 16,2021
  • Adopted:
  • Online: October 26,2021
  • Published: April 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063