面向RISC-V向量扩展的高性能算法库优化方法

doi:10.13328/j.cnki.jos.007360

微信服务号

微信订阅号

2025年8月1日 20:12 星期五

首页 > 过刊浏览>2025年第36卷第9期 >3985-4005. DOI:10.13328/j.cnki.jos.007360

PDF HTML阅读 XML下载导出引用引用提醒

面向RISC-V向量扩展的高性能算法库优化方法
DOI:
                        10.13328/j.cnki.jos.007360
                    
CSTR:
                        
                    
作者:
                        韩柳彤韩柳彤
中国科学院 软件研究所, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
张洪滨张洪滨
中国科学院 软件研究所, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
邢明杰邢明杰
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
武延军武延军
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
赵琛赵琛
中国科学院 软件研究所, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:武延军,E-mail:yanjun@iscas.ac.cn
中图分类号:TP316
基金项目:中国科学院战略性先导科技专项 (A类) (XDA0320200)

Optimization Method for High-performance Libraries Targeting RISC-V Vector Extension

Author:

HAN Liu-Tong
HAN Liu-Tong
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Hong-Bin
ZHANG Hong-Bin
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
XING Ming-Jie
XING Ming-Jie
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
WU Yan-Jun
WU Yan-Jun
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Chen
ZHAO Chen
Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [21]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

高性能算法库可以通过向量化的方式高效地利用单指令多数据(SIMD)硬件的能力, 从而提升其在CPU上的执行性能. 其中, 向量化的实现需要使用目标 SIMD 硬件的特定编程方法, 而不同SIMD扩展的编程模型和编程方法均存在较大差异. 为了避免优化算法在不同平台上的重复实现, 提高算法库的可维护性, 在高性能算法库的开发过程中通常需要引入硬件抽象层. 由于目前主流SIMD扩展指令集均被设计为具有固定长度的向量寄存器, 多数硬件抽象层也是基于定长向量的硬件特性而设计, 无法包含RISC-V向量扩展所引入的可变向量寄存器长度的硬件特性. 而若将RISC-V向量扩展视作定长向量扩展引入现有硬件抽象层设计中, 会产生不必要的开销, 造成性能损失. 为此, 提出了一种面向可变长向量扩展平台和固定长度SIMD扩展平台的硬件抽象层设计方法. 基于此方法, 重新设计和优化了OpenCV算法库中的通用内建函数, 使其在兼容现有SIMD平台的基础上, 更好地支持RISC-V向量扩展设备. 将采用优化方法的OpenCV算法库与原版算法库进行性能比较, 实验结果表明, 运用该方法设计的通用内建函数能够将RISC-V向量扩展高效地融入算法库的硬件抽象层优化框架中, 并在核心模块中获得3.93倍的性能提升, 显著优化了高性能算法库在RISC-V设备上的执行性能, 从而验证了该方法的有效性. 此外, 工作已经开源并被OpenCV社区集成到其源代码之中, 证明了方法的实用性和应用价值.

关键词:RISC-V 向量扩展;数据级并行;高性能库优化;开源计算机视觉算法库(OpenCV)

Abstract:

The performance acceleration of high-performance libraries on CPUs can be achieved by leveraging SIMD hardware through vectorization. Implementing vectorization requires programming methods tailored to the target SIMD hardware, which vary significantly across different SIMD extensions. To avoid redundant implementations of algorithm optimizations on various platforms and enhance the maintainability of algorithm libraries, a hardware abstraction layer (HAL) is often introduced. However, most existing HAL designs are based on fixed-length vector registers, aligning with the fixed-length nature of conventional SIMD extension instruction sets. This design fails to accommodate the variable-length vector register introduced by the RISC-V vector extension. Treating RISC-V vector extensions as fixed-length vectors within traditional HAL designs results in unnecessary overhead and performance degradation. To address this problem, the study proposes a HAL design method compatible with both variable-length vector extensions and fixed-length SIMD extensions. Using this approach, the universal intrinsic functions in the OpenCV library are redesigned and optimized to better support RISC-V vector extension devices while maintaining compatibility with existing SIMD platforms. Performance comparisons between the optimized and original OpenCV libraries reveal that the redesigned universal intrinsic function efficiently integrates RISC-V vector extensions into the HAL optimization framework, achieving a 3.93× performance improvement in core modules. These results validate the effectiveness of the proposed method, significantly enhancing the execution performance of high-performance libraries on RISC-V devices. In addition, the proposed approach has been open-sourced and integrated into the OpenCV repository, demonstrating its practicality and application value.

Key words:RISC-V vector extension;data-level parallelism;high-performance library optimization;open source computer vision library (OpenCV)

参考文献

[1] Luebke D. CUDA: Scalable parallel programming for high-performance scientific computing. In: Proc. of the 5th IEEE Int’l Symp. on Biomedical Imaging: From Nano to Macro. Paris: IEEE, 2008. 836–838. [doi:10.1109/ISBI.2008.4541126]

[2] Munshi A. The OpenCL specification. In: Proc. of the 2009 IEEE Hot Chips 21 Symp. (HCS). Stanford: IEEE, 2009. 1–314. [doi: 10.1109/HOTCHIPS.2009.7478342]

[3] Lomont C. Introduction to Intel advanced vector extensions. Intel white paper. 2011, 23: 1–21.

[4] Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P. The ARM scalable vector extension. IEEE Micro, 2017, 37(2): 26–39.

[5] 胡伟武, 汪文祥, 吴瑞阳, 王焕东, 曾露, 徐成华, 高翔, 张福新. 龙芯指令系统架构技术. 计算机研究与发展, 2023, 60(1): 2–16.

Hu WW, Wang WX, Wu RY, Wang HD, Zeng L, Xu CH, Gao X, Zhang FX. Loongson instruction set architecture technology. Journal of Computer Research and Development, 2023, 60(1): 2–16 (in Chinese with English abstract).

[6] 刘畅, 武延军, 吴敬征, 赵琛. RISC-V指令集架构研究综述. 软件学报, 2021, 32(12): 3992–4024. http://www.jos.org.cn/1000-9825/6490.htm

Liu C, Wu YJ, Wu JZ, Zhao C. Survey on RISC-V system architecture research. Ruan Jian Xue Bao/Journal of Software, 2021, 32(12): 3992–4024 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6490.htm

[7] Bradski G, Kaehler A. Learning OpenCV: Computer Vision with the OpenCV library. O'Reilly Media, Inc. , 2008.

[8] Universal intrinsics. https://docs.opencv.org/4.x/df/d91/group__core__hal__intrin.html, 2024.

[9] riscv/riscv-v-spec: Working draft of the proposed RISC-V V vector extension. https://github.com/riscv/riscv-v-spec, 2024.

[10] 冯竞舸, 贺也平, 陶秋铭. 自动向量化: 近期进展与展望. 通信学报, 2022, 43(3): 180–195.

Feng J K, He Y P, Tao Q M. Auto-vectorization: Recent development and prospect. Journal of Communications, 2022, 43(3): 180–195 (in Chinese with English abstract).

[11] Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 2013, 48(6): 519–530.

[12] Kretz M. Extending C++ for explicit data-parallel programming via SIMD vector types [Ph.D. Thesis]. Frankfurt am Main: der Johann Wolfgang Goethe-Universit?t, 2015. [doi: 10.13140/RG.2.1.2355.4323]

[13] Highway: About performance-portable, length-agnostic SIMD with runtime dispatch. 2024. https://github.com/google/highway

[14] 纪守领, 王琴应, 陈安莹, 赵彬彬, 叶童, 张旭鸿, 吴敬征, 李昀, 尹建伟, 武延军. 开源软件供应链安全研究综述. 软件学报, 2023, 34(3): 1330–1364. http://www.jos.org.cn/1000-9825/6717.htm

Ji SL, Wang QY, Chen AY, Zhao BB, Ye T, Zhang XH, Wu JZ, Li J, Yin JW, Wu YJ. Survey on open-source software supply chain security. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1330–1364. http://www.jos.org.cn/1000-9825/6717.htm

[15] libjpeg-turbo. A JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression. 2024. http://sourceforge.net/projects/libjpeg-turbo

[16] Genc H, Kim S, Amid A, Haj-Ali A, Iyer V, Prakash P, Zhao J, Grubb D, Liew H, Mao H, Ou A, Schmidt C, Steffl S, Wright J, Stoica I, Ragan-Kelley J, Asanovic K, Nikolic B, Shao YS. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In: Proc. of the 58th ACM/IEEE Design Automation Conf. (DAC). San Francisco: IEEE, 2021. 769–774. [doi: 10.1109/DAC18074.2021.9586216]

[17] Li RS, Peng P, Shao ZY, Jin H, Zheng R. Evaluating RISC-V vector instruction set architecture extension with computer vision workloads. Journal of Computer Science and Technology, 2023, 38(4): 807–820.

引用本文

韩柳彤,张洪滨,邢明杰,武延军,赵琛.面向RISC-V向量扩展的高性能算法库优化方法.软件学报,2025,36(9):3985-4005

复制

文章指标

点击次数:429
下载次数: 444
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2024-08-26
最后修改日期:2024-10-15
录用日期:
在线发布日期: 2024-12-10
出版日期: 2025-09-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码