NUMA-conscious外键连接优化技术
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家重点研发计划(2023YFB4503600); 国家自然科学基金(U23A20299, 62172424, 62276270, 62322214)


NUMA-conscious Foreign Key Join Optimization Technique
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    NUMA (non-uniform memory access)是现代多核、多路处理器平台上主流的内存访问架构, NUMA访问延迟对数据库的查询性能有较大影响, 因此如何降低查询处理中跨NUMA节点的访问延迟是现代内存数据库查询优化的热点问题之一. 不同的处理器在NUMA架构、NUMA延迟等方面差异较大, 因此NUMA优化技术需要与硬件特性相结合. 基于内存数据库执行代价最高和对数据局部性依赖最强的内存外键连接算法, 面向代表性的ARM、Intel CLX、Intel ICX、AMD Zen2和AMD Zen3这5个处理器NUMA架构和延迟特征, 探索了不同NUMA优化方法, 包括NUMA-conscious和NUMA-oblivious实现技术. 在数据存储、数据分片、连接中间结果缓存等方面采用不同的优化方案, 比较了不同处理器架构上的算法性能, 实验结果表明, NUMA-conscious优化策略需软、硬件相结合, 其中Radix Join对NUMA延迟敏感度为中性, 在5个不同的处理器平台上, NUMA优化性能收益稳定在30%左右, NPO算法对NUMA延迟敏感度较高, 在不同平台NUMA优化性能收益在38%–57%, Vector Join算法对NUMA延迟敏感但影响幅度较小, NUMA优化性能收益在1%–25%之间, 且在算法性能特征上, Vector Join受cache效率影响比NUMA延迟影响更大; NUMA-conscious优化技术在ARM平台差异较大, 在x86平台差异极小, NUMA-oblivious算法复杂度更低, 具有较好的通用性. 从处理器硬件发展趋势来看, 降低NUMA访问延迟可以有效地降低不同NUMA-conscious优化算法的性能差异, 简化连接算法的复杂度, 提高连接操作性能.

    Abstract:

    Non-uniform memory access (NUMA) is the mainstream memory access architecture for state-of-the-art multicore and multi-way processor platforms. Reducing the latency of cross-NUMA node accesses during queries is a key issue for modern in-memory database query optimization techniques. Due to the differences in NUMA architectures and NUMA latency across various processors, NUMA optimization techniques should be combined with hardware characteristics. This study focuses on the in-memory foreign key join algorithm, which has high cost and strong locality of data dependency in in-memory databases, and explores different NUMA optimization techniques, including NUMA-conscious and NUMA-oblivious implementations, on five platforms featuring ARM, Intel CLX/ICX, and AMD Zen2/Zen3 processors. The study also compares the performance of the algorithms across different processor platforms with strategies such as data storage, data partitioning, and join intermediate result caching. Experimental results show that the NUMA-conscious optimization strategy requires the integration of both software and hardware. Radix Join demonstrates neutral sensitivity to NUMA latency, with NUMA optimization gains constantly around 30%. The NPO algorithm shows higher sensitivity to NUMA latency, with NUMA optimization gains ranging from 38% to 57%. The Vector Join algorithm is sensitive to NUMA latency, but the impact is relatively minor, with NUMA optimization gains varying from 1% to 25%. For algorithm performance characteristics, cache efficiency influences the Vector Join performance more than NUMA latency. NUMA-conscious optimization techniques show significant differences on ARM platforms, while the differences are minimal on x86 platforms. The less complex NUMA-oblivious algorithms exhibit greater generality. Given hardware trends, reducing NUMA latency can effectively reduce performance gaps in NUMA-conscious optimization techniques, simplify join algorithm complexity, and improve join operation performance.

    参考文献
    相似文献
    引证文献
引用本文

韩瑞琛,张延松,刘专,张宇,焦敏,王珊. NUMA-conscious外键连接优化技术.软件学报,2025,36(12):5821-5850

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-09-29
  • 最后修改日期:2024-12-22
  • 录用日期:
  • 在线发布日期: 2025-07-17
  • 出版日期: 2025-12-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号