面向RISC-V向量扩展的高性能算法库优化方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

武延军,E-mail:yanjun@iscas.ac.cn

中图分类号:

TP311

基金项目:

中国科学院战略性先导科技专项(A类)(XDA0320200)


Optimization Method for High-Performance Libraries Targeting RISC-V Vector Extension
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    高性能算法库可以通过向量化的方式高效地利用单指令多数据(SIMD)硬件的能力,从而提升其在CPU上的执行性能.其中,向量化的实现需要使用目标 SIMD 硬件的特定编程方法,而不同SIMD扩展的编程模型和编程方法均存在较大差异.为了避免优化算法在不同平台上的重复实现,提高算法库的可维护性,在高性能算法库的开发过程中通常需要引入硬件抽象层.由于目前主流SIMD扩展指令集均被设计为具有固定长度的向量寄存器,多数硬件抽象层也是基于定长向量的硬件特性而设计,无法包含RISC-V向量扩展所引入的可变向量寄存器长度的硬件特性.而若将RISC-V向量扩展视作定长向量扩展引入现有硬件抽象层设计中,会产生不必要的开销,造成性能损失.为此,本文提出了一种面向可变长向量扩展平台和固定长度SIMD扩展平台的硬件抽象层设计方法.基于此方法,本文重新设计和优化了OpenCV算法库中的通用内建函数,使其在兼容现有SIMD平台的基础上,更好地支持RISC-V向量扩展设备.将采用本文优化方法的OpenCV算法库与原版算法库进行性能比较,实验结果表明,运用本方法设计的通用内建函数能够将RISC-V向量扩展高效地融入算法库的硬件抽象层优化框架中,并在核心模块中获得3.93倍的性能提升,显著优化了高性能算法库在RISC-V设备上的执行性能,从而验证了该方法的有效性.此外,本文工作已经开源并被OpenCV社区集成到其源代码之中,证明了本文方法的实用性和应用价值.

    Abstract:

    The performance acceleration of high-performance libraries on CPUs can be achieved by efficiently leveraging SIMD hardware through vectorization. Implementing vectorization depends on programming methods specific to the target SIMD hardware. However, the programming models and methods of different SIMD extensions vary significantly. To avoid redundant implementation of algorithm optimizations across various platforms and improve the maintainability of algorithm libraries, a hardware abstraction layer (HAL) is often introduced in high-performance libraries. Since existing SIMD extension instruction sets are designed with fixed-length vector registers, most hardware abstraction layers only support fixed-length vector types and operations. However, the design of fixed-length vector representations in hardware abstraction layers cannot accommodate the variable vector register lengths introduced by the RISC-V vector extension. Treating RISC-V vector extensions as fixed-length vectors within existing HAL designs would introduce unnecessary overhead and cause performance degradation. To address this problem, the paper proposes a HAL design method compatible with variable-length vector extension platforms and fixed-length SIMD extension platforms. Based on our method, the OpenCV universal intrinsic functions have been redesigned and optimized to support RISC-V vector extension devices better while maintaining compatibility with existing SIMD platforms. Moreover, we designed experiments to compare the performance of the OpenCV library optimized using our approach against the original version. The results demonstrate that the universal intrinsic redesigned by our method efficiently integrates RISC-V vector extensions into the hardware abstraction layer optimization framework. Our method achieved a 3.93x performance improvement in core modules, significantly enhancing the execution performance of the high-performance library on RISC-V devices, thereby validating the effectiveness of this paper. Additionally, our work has been open-sourced and integrated into the OpenCV source code, demonstrating our approach’s practicality and application value.

    参考文献
    相似文献
    引证文献
引用本文

韩柳彤,张洪滨,邢明杰,武延军,赵琛.面向RISC-V向量扩展的高性能算法库优化方法.软件学报,2025,36(9):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-11-20
  • 录用日期:
  • 在线发布日期: 2024-12-10
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号