基于数据分布的移动对象学习索引及查询算法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家自然科学基金(62472217, U23A20296)


Moving Object Learned Index and Query Algorithm Based on Data Distribution
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    移动对象的来源丰富、获取简单、运动频繁, 导致数据量呈现爆发式增长, 高效管理移动对象数据的需求日益增加, 使得移动对象数据的索引及查询成为亟待解决的热点问题. 传统的移动对象索引基于空间划分, 能够有效地处理对象的空间位置和时间变化, 但由于移动对象的动态特性需要频繁更新索引, 在对象数量庞大时会导致维护成本显著增加. 学习索引作为新型索引技术, 可以运用机器学习方法提高查询效率, 降低存储成本, 但学习索引并不适用于具有多维特性的移动对象数据. 为此, 提出了一种基于非均匀网格降维的学习索引NUGC_LI, 使用类似B+树的递归层次模型结构. 该学习索引分为根节点、内部节点和叶子节点这3个部分, 使用多阶段线性模型对灵活划分后的数据分布进行拟合学习, 并在叶子节点中设置有空隙的数组和节点关键值范围, 提高节点更新和查询效率. 同时, 对真实出租车轨迹、系统仿真火车轨迹和随机生成轨迹数据集分别建立了B+树、RMI、ALEX、NUGC_LI、3DR树与TB树索引. 真实数据集、仿真数据集和随机数据集中涉及的轨迹点分别约917000个、51544个和5222752个. 通过对比实验与伸缩性测试, 在索引构建上, NUGC_LI相较于TB树、3DR树、B+树、RMI和ALEX分别降低了约91.45%、89.63%、90.38%、87.46%及13.71%的构建时间; 在更新操作上, 其更新时间降低至少93.76%. 基于NUGC_LI的范围查询、最近邻查询和相似轨迹查询在大数据量条件下均显示出显著优势, 查询时间分别至少比ALEX降低8.74%、30%和16.07%; 比RMI降低29.38%、77.44%和25.24%; 比B+树降低52.72%、92.44%和70.5%; 比3DR树降低53.09%、91.2%和67.58%; 比TB树降低52.67%、90.43%和67.47%. NUGC_LI索引在多任务负载下不仅具备较高的扩展性, 而且在构建、更新以及查询操作中均实现了显著的性能提升.

    Abstract:

    The abundance of sources, ease of acquisition, and frequent movement of moving objects have led to exponential growth in data volume. The growing need for efficient management of moving object data has made indexing and querying such data a pressing issue. Traditional moving object indexes, based on spatial partitioning, can effectively handle changes in the spatial position and temporal dynamics of objects. However, due to the dynamic nature of moving objects, which require frequent index updates, maintaining these indexes becomes costly with large datasets. Learned indexes, as an emerging indexing technique, have the potential to improve query efficiency and reduce storage costs by leveraging machine learning methods. Nevertheless, learned indexes are not well-suited for data with multidimensional characteristics. To address this limitation, the proposed learned index uses the non-uniform grid code algorithm (NUGC_LI). It employs a recursive hierarchical model structure similar to the B+-tree, divided into root, internal, and leaf nodes. The learned index uses a multi-phase linear model to adapt to the flexibly divided data distribution, setting arrays with gaps and node key value ranges in the leaf nodes to improve node update and query efficiency. At the same time, B+-tree, RMI, ALEX, NUGC_LI, 3D R-tree, and TB-tree indexes are constructed for real taxi trajectories, system simulation train trajectories, and randomly generated trajectory datasets for comparison. The number of trajectory points in the real, simulated, and random datasets is approximately 917000, 51544, and 5222752, respectively. Through comparative experiments and scalability tests, NUGC_LI reduces the index construction time by approximately 91.45%, 89.63%, 90.38%, 87.46%, and 13.71% compared to TB-tree, 3D R-tree, B+-tree, RMI, and ALEX, respectively. For update operations, the update time is reduced by at least 93.76%. Range queries, nearest neighbor queries, and similar trajectory queries based on NUGC_LI show significant advantages under large-scale data conditions, with query times reduced by at least 8.74%, 30%, and 16.07% compared to ALEX; 29.38%, 77.44%, and 25.24% compared to RMI; 52.72%, 92.44%, and 70.5% compared to B+-tree; 53.09%, 91.2%, and 67.58% compared to 3D R-tree; and 52.67%, 90.43%, and 67.47% compared to TB-tree. The NUGC_LI index not only demonstrates high scalability under multi-task loads but also achieves significant performance improvements in construction, updates, and query operations.

    参考文献
    相似文献
    引证文献
引用本文

王撷阳,巢成,金鑫,许建秋,高云君.基于数据分布的移动对象学习索引及查询算法.软件学报,,():1-33

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-28
  • 最后修改日期:2025-04-02
  • 录用日期:
  • 在线发布日期: 2025-10-29
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号