Abstract:This paper first introduces the SIMD (single instruction multiple data) extension technology and presents three ways to use SIMD instructions. It is considered that calling the third party library, which is optimized for the target platform by using those instructions, is the best way to benefit the developers. Next, it introduces the China-developed SW-1600 CPU, and a software package called SW-VML, which consists of many mathematical functions, by using the SIMD extension technology. In order to solve the additional overhead caused by unaligned address access and transformation between vector and scalar array, the paper provides some performance optimized methods, such as aligned address access, simplifying vector condition branch and loop unrolling. An upgrade to SW-VML is also offered to support multi-thread with OpenMP. Finally, functions in the package are tested using arrays of different sizes on SW-1600,and the test results show that high performance is achieved with the technology of the SIMD vectorization. Compared with the traditional methods of the scalar calculation, the average speedup is up to 2.06. The performance speedup of package using 4 threads is up to 2.26 compared to using a single thread. SW-VML is a common vector function package for domestic Sunway processor series, and it can be used as a basic toolkit which is beneficial to high performance computing on Sunway platform.