向量数据库的K近邻图高效更新方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划 (2023YFB4503600); 国家自然科学基金 (62525202, 62232009); 深圳市承接国家重大科技项目 (CJGJZD20230724093403007); 高速铁路与城轨交通系统技术国家工程研究中心实验室基础研究项目&中国国家铁路集团有限公司科技研究开发计划 (L2024W001)


Efficient Updating Method for K-nearest Neighbor Graph in Vector Databases
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在高维数据处理中, K近邻图作为一种关键的数据结构, 广泛应用于聚类、图神经网络和推荐系统等领域. 然而, 随着预训练嵌入模型在非结构化数据建模与检索中的广泛使用, 嵌入模型的微调逐渐成为提升嵌入向量的语义表示能力的核心步骤. 嵌入微调通常会导致全部数据的向量表示发生系统性变化, 从而使原有K近邻图的邻接关系失效. 现有研究主要关注于如何为静态数据构建K近邻图, 缺乏对微调后的嵌入向量进行快速适应的研究. 为此, 提出一种面向嵌入模型微调场景的高效K近邻图更新方法FastAdjust. 该方法基于嵌入模型微调为每条数据嵌入带来的影响较小的观察, 通过局部更新策略对原始K近邻图进行增量调整, 在确保最终K近邻图质量的同时, 显著提升更新效率. 具体而言, 首先, FastAdjust利用基于乘积量化的聚类结构, 为每条数据高效且准确地定位可能成为邻居的数据子集, 缩小候选邻居搜索范围; 其次, 基于数据密度和嵌入变化幅度, FastAdjust结合二者与数据K近邻变化程度的相关性, 为邻居关系变化程度不同的数据针对性地分配不同的更新资源, 从而提升整体更新效率. 真实数据集上的实验结果表明, FastAdjust在嵌入模型微调的场景下能够快速调整K近邻图, 准确地适应数据嵌入的变化, 同时大幅减少计算开销, 具有良好的实用价值和扩展性.

    Abstract:

    In high-dimensional data processing, the K-nearest neighbor (KNN) graph is a critical data structure widely used in tasks such as clustering, graph neural networks, and recommendation systems. However, with the increasing use of pretrained embedding models in unstructured data modeling and retrieval, embedding model fine-tuning has become a key step in enhancing the semantic representation capability of embeddings. Such fine-tuning often leads to systematic changes in the vector representations of all data points, which invalidates the original neighborhood relationships in the KNN graph. Existing research primarily focuses on building KNN graphs for static data, lacking efficient solutions for adapting to updated embeddings after fine-tuning. To address this gap, this study proposes FastAdjust, an efficient KNN graph update method tailored for embedding model fine-tuning scenarios. Leveraging the observation that fine-tuning usually causes only minor changes to individual embeddings, incremental adjustments to the original KNN graph are performed by FastAdjust through a local update strategy, significantly improving update efficiency while maintaining graph quality. Specifically, FastAdjust first employs a clustering structure based on product quantization to efficiently and accurately locate a subset of candidate neighbors for each data point, thus narrowing the search space. Secondly, based on data density and the magnitude of embedding variation, FastAdjust leverages their correlation with changes in the KNNs to adaptively allocate update resources according to the degree of neighbor relationship changes, thus improving overall update efficiency. Experimental results on real-world datasets demonstrate that FastAdjust efficiently and accurately adapts KNN graphs to embedding updates with significantly reduced computational cost, showing strong practical value and scalability.

    参考文献
    相似文献
    引证文献
引用本文

王嘉翼,徐士惠,李国良.向量数据库的K近邻图高效更新方法.软件学报,2026,37(3):1006-1020

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-05-06
  • 最后修改日期:2025-06-30
  • 录用日期:
  • 在线发布日期: 2025-09-02
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号