LA-tree: 查询感知的自适应学习型多维索引
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(62436010, 62502520); 北京市自然科学基金-海淀原始创新联合基金 (L222006)


LA-tree: Query-aware Adaptive Learned Multi-dimensional Index
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    结构化数据分析通常需要在表格数据的多维属性上执行联合范围查询, 高效的多维索引因此成为数据库系统的关键支撑. 然而, 现有多维索引方法在高维场景下存在局限: 传统多维索引仅按数据分布进行均匀划分, 缺乏对查询特征的感知, 导致筛选效果有限; 而现有学习型多维索引虽引入查询感知, 但划分往往极不均匀, 使部分单元过大, 扫描成本显著增加. 为了解决上述问题, 提出一种新型的LA-tree学习型树形多维索引, 同时兼顾数据分布与查询负载感知. 在离线构建阶段, LA-tree将节点维度选择建模为最小化查询扫描比的问题, 并提出分层贪心搜索算法, 实现了均匀划分与查询感知的统一. 在在线查询阶段, 引入轻量线性模型与分段线性模型, 将传统的数值比较转化为快速映射计算, 在保证结果完整性的同时显著降低筛选延迟. 在动态场景中, 提出基于扫描量监控的自适应增量更新机制, 通过局部子树重构高效适配数据与查询负载的变化, 避免了整体索引重建的高昂代价. 实验结果表明, LA-tree在多个真实和基准数据集上均显著优于现有方法: 在静态场景中查询用时较最佳基准方法平均降低52%, 在动态场景中更新开销较重构方法减少97%, 同时保持低查询延迟与轻量级索引规模.

    Abstract:

    Structured data analysis typically requires performing multi-attribute queries over tabular data, making efficient multi-dimensional indexes key support for database systems. However, existing multi-dimensional indexing methods face limitations in high-dimensional scenarios. Traditional multi-dimensional indexing methods partition data uniformly based on data distribution but lack the awareness of query features, resulting in limited filtering effectiveness. In contrast, although existing learned multi-dimensional indexes introduce query-awareness, they often produce highly unbalanced partitions, thereby resulting in some oversized partitions and substantially increased scanning costs. To this end, this study proposes LA-tree, a novel learned tree-based multi-dimensional index that balances both data distribution and query workload awareness. In the offline construction phase, LA-tree formulates the selection of partitioning dimensions at each node as an optimization problem of minimizing the overall scan ratio of the query workload, and puts forward a hierarchical greedy search algorithm to achieve the unity of uniform partitioning and query-awareness. In the online query phase, the lightweight linear model and piecewise linear model are introduced to transform traditional numerical comparisons to fast mapping computations, thereby reducing filtering latency while ensuring the completeness of query results. In dynamic settings, an adaptive incremental update mechanism based on scan volume monitoring is proposed to efficiently adapt to changes in data and query workloads via local subtree reconstruction, thereby avoiding the high cost of rebuilding the entire index. Experimental results demonstrate that LA-tree outperforms existing methods on multiple real-world and benchmark datasets. In static settings, the query time is reduced by an average of 52% compared with the optimal benchmark method, while in dynamic settings, the update costs are reduced by 97% compared with the reconstruction methods. Additionally, low query latency and lightweight index scale are maintained.

    参考文献
    相似文献
    引证文献
引用本文

刘佳伟,范举,张超,杜小勇. LA-tree: 查询感知的自适应学习型多维索引.软件学报,2026,37(2):485-507

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-09-01
  • 最后修改日期:2025-10-07
  • 录用日期:
  • 在线发布日期: 2025-12-10
  • 出版日期: 2026-02-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号