Abstract:The performance acceleration of high-performance libraries on CPUs can be achieved by efficiently leveraging SIMD hardware through vectorization. Implementing vectorization depends on programming methods specific to the target SIMD hardware. However, the programming models and methods of different SIMD extensions vary significantly. To avoid redundant implementation of algorithm optimizations across various platforms and improve the maintainability of algorithm libraries, a hardware abstraction layer (HAL) is often introduced in high-performance libraries. Since existing SIMD extension instruction sets are designed with fixed-length vector registers, most hardware abstraction layers only support fixed-length vector types and operations. However, the design of fixed-length vector representations in hardware abstraction layers cannot accommodate the variable vector register lengths introduced by the RISC-V vector extension. Treating RISC-V vector extensions as fixed-length vectors within existing HAL designs would introduce unnecessary overhead and cause performance degradation. To address this problem, the paper proposes a HAL design method compatible with variable-length vector extension platforms and fixed-length SIMD extension platforms. Based on our method, the OpenCV universal intrinsic functions have been redesigned and optimized to support RISC-V vector extension devices better while maintaining compatibility with existing SIMD platforms. Moreover, we designed experiments to compare the performance of the OpenCV library optimized using our approach against the original version. The results demonstrate that the universal intrinsic redesigned by our method efficiently integrates RISC-V vector extensions into the hardware abstraction layer optimization framework. Our method achieved a 3.93x performance improvement in core modules, significantly enhancing the execution performance of the high-performance library on RISC-V devices, thereby validating the effectiveness of this paper. Additionally, our work has been open-sourced and integrated into the OpenCV source code, demonstrating our approach’s practicality and application value.