Abstract:With the widespread application of multi-source, heterogeneous, and multi-modal data in scenarios such as large models and data lakes, there has been a significant growth in vector-based data retrieval and storage management. By mapping heterogeneous data into high-dimensional vector representations and leveraging vector indices, vector databases facilitate the unified management of diverse data types and enable high-quality similarity search, establishing them as a crucial foundation for applications like generative retrieval and AI-native databases. However, existing vector databases face significant bottlenecks in terms of storage and indexing efficiency, index construction complexity, and retrieval accuracy. Specifically, massive high-dimensional vectors lead to increased storage overhead and maintenance costs for indices. Furthermore, vector index structures are often bloated, resulting in substantial memory consumption. Moreover, the degradation of retrieval accuracy caused by distortion from compression techniques remains an unresolved challenge. This study proposes a framework based on weight residual vector quantization (WRVQ). This method achieves efficient compression and storage with very low distortion by decoupling the quantization direction from the residual magnitude. It stores the residual direction as a unit vector and appends a weight marker. For indexing, a three-layer inverted index structure tailored to the characteristics of WRVQ is designed, comprising an exact match layer, a fuzzy match layer, and a search layer. This structure organically integrates asymmetric distance computation (ADC) with nearest neighbor search techniques to realize approximate nearest neighbor (ANN) search that balances both high accuracy and high efficiency. Experimental results on large-scale datasets demonstrate that, compared to traditional low-dimensional embedding models and existing quantization methods, WRVQ achieves significant improvements across key metrics, including quantization loss, storage compression ratio, and retrieval recall. Furthermore, it exhibits considerable advantages in both index construction and query performance.