Hybrid Instruction Scheduling Algorithm for RISC-V VLIW Architecture
Author:
Affiliation:

Clc Number:

TP314

  • Article
  • | |
  • Metrics
  • |
  • Reference [49]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Instruction level parallelism is a classic problem in the research of processor architecture. VLIW architecture is a common architecture to enhance Instruction level parallelism in the field of digital signal processor. The instruction issue order is determined by the compiler for VLIW architecture, so VLIW’s Instruction level parallelism performance strongly depends on the instruction scheduling of compiler. In order to explore the performance poteneial of RISC-V VLIW architecture and enrich the RISC-V ecosystem, this paper studies the optimization of instruction scheduling algorithm of RISC-V VLIW architecture. For a single scheduling region, the integer linear programming scheduling can obtain the optimal scheduling solution with high complexity, and the list scheduling, which has low complexity, cannot obtain the optimal scheduling solution. In order to combine the advantages of the two scheduling algorithms, this paper proposes an IPC theoretical model guided hybrid instruction scheduling algorithm. The scheduling region where the list scheduling has not reached the optimal solution can be located with IPC theoretical model, and then the integer linear programming scheduling algorithm further processes the located scheduling region. The theoretical model is based on data flow analysis and considers both instruction dependency and hardware resources, and can give the theoretical upper bound of IPC in linear complexity. The core of hybrid scheduling lies in the accuracy of IPC theoretical model, which is 95.74% in this paper. On the given benchmark, the IPC theoretical model can identify that 94.62% of the scheduling region has reached the optimal solution under list scheduling, so only 5.38% of the scheduling region needs to be further scheduled by integer linear programming. The hybrid scheduling algorithm can achieve the scheduling effect of integer linear programming scheduling with the complexity close to that of list scheduling.

    Reference
    [1] Waterman A, Asanovic K. The RISC-V Instruction Set Manual Volume I: Unprivileged IAS. Document Version 20191213[EB/OL]. https://riscv.org/wp-content/uploads/2019/12/riscv-spec-20191213.pdf. 2024-08-09.
    [2] Bao Y G, Sun N H. Opportunities and challenges of building CPU ecosystem with open-source mode. Bulletin of Chinese Academy of Sciences, 2022, 37(1): 24-29. (in Chinese)
    [3] Celio C, Patterson D A, Asanovic K. The berkeley out-of-order machine (boom): An industry-competitive, synthesizable, parameterized risc-v processor[J]. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-167, 2015.
    [4] Li XL, Han M, Hao K, Xue HY, Lu SJ, Zhang KM, Qi N, Niu XM, Xiao LM, Hao QF. Design of RISC-V CPU for 100 Gbps Network Application[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(6): 956-962. DOI: 10.3724/SP.J.1089.2021.18538 (in Chinese with English abstract).
    [5] Hu ZB. Teach You to Design CPU by hand. RISC-V Processor [M]. Posts and Telecommunications Press, 2018.
    [6] Tine B, Yalamarthy K P, Elsabbagh F, Hyesoon K. Vortex: Extending the RISC-V ISA for GPGPU and 3D-graphics[C]//MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021: 754-766.
    [7] Fisher J A. Very long instruction word architectures and the ELI-512[C]//Proceedings of the 10th annual international symposium on Computer architecture. 1983: 140-150.
    [8] Qualcomm. The Qualcomm Hexagon SDK[EB/OL]. https://www.qualcomm.com/developer/software/hexagon-npu-sdk. 2024-08-09.
    [9] Texas Instruments. TMS320C64x Technical Overview. https://www.ti.com/lit/ug/spru395b/spru395b.pdf. 2024-08-09.
    [10] WANG XQ, HONG Y, WANG H, ZHENG QL. Compiler Design and Optimization for BWDSP[J]. Acta Electronica Sinica, 2015, 43(8): 1656-1661. https://doi.org/10.3969/j.issn.0372-2112.2015.08.028 (in Chinese with English abstract).
    [11] Michael Larabel. Kalray VLIW processor family (kvx) [EB/OL]. https://www.phoronix.com/news/Kalray-KVX-Linux-Port . 2024-10-29.
    [12] Qui N M, Lin C H, Chen P. Design and implementation of a 256-bit RISC-V-based dynamically scheduled very long instruction word on FPGA[J]. IEEE Access, 2020, 8: 172996-173007.
    [13] Hennessy J L, Gross T. Postpass Code Optimization of Pipeline Constraints[J]. Acm Transactions on Programming Languages & Systems, 1983. DOI:https://doi.org /10.1145/2166.357217
    [14] Bernstein D, Gertner I. Scheduling expressions on a pipelined processor with a maximal delay of one cycle[J]. ACM Transactions on Programming Languages and Systems (TOPLAS), 1989, 11(1): 57-66.
    [15] Auyeung A, Gondra I, Dai H K. Multi-heuristic list scheduling genetic algorithm for task scheduling[C]//Proceedings of the 2003 ACM symposium on applied computing. 2003: 721-724.
    [16] Huang L, Feng XB. Survey on techniques of integrated instruction scheduling and register allocation[J]. Application Research of Computers. 2008, 25(4): 979-982. (in Chinese with English abstract).
    [17] Deng C, Chen Z, Shi Y, Ma Y, Wen M, Luo L. Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming[J]. ACM Transactions on Design Automation of Electronic Systems, 2024.
    [18] Fisher J A. Trace scheduling: A technique for global microcode compaction[J]. IEEE transactions on computers, 1981, 30(07): 478-490.
    [19] Colwell R P, Nix R P, O'Donnell J J, Papworth D B, Rodman P K. A VLIW architecture for a trace scheduling compiler[J]. ACM SIGARCH Computer Architecture News, 1987, 15(5): 180-192.
    [20] Hwu, W. M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., ... & Lavery, D. M. The superblock: An effective technique for VLIW and superscalar compilation[M]//Instruction-Level Parallelism: A Special Issue of The Journal of Supercomputing. Boston, MA: Springer US, 2011: 229-248.
    [21] Mahlke S A, Lin D C, Chen W Y, Hank R E, Bringmann R A. Effective compiler support for predicated execution using the hyperblock[J]. ACM SIGMICRO Newsletter, 1992, 23(1-2): 45-54.
    [22] Giesemann F, Payá-Vayá G, Gerlach L, Blume H, Pflug F, von Voigt G. Using a genetic algorithm approach to reduce register file pressure during instruction scheduling[C]//2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 2017: 179-187.
    [23] Giesemann F, Gerlach L, Paya-Vaya G. Evolutionary algorithms for instruction scheduling, operation merging, and register allocation in VLIW compilers[J]. Journal of Signal Processing Systems, 2020, 92(7): 655-678.
    [24] Stuckmann F, Payá-Vayá G. A Graph Neural Network Approach to Improve List Scheduling Heuristics Under Register-Pressure[C]//2024 13th International Conference on Modern Circuits and Systems Technologies (MOCAST). IEEE, 2024: 01-06.
    [25] Six C, Boulmé S, Monniaux D. Certified and efficient instruction scheduling: application to interlocked VLIW processors[J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-29.
    [26] Six C, Gourdin L, Boulmé S, Monniaux D, Fasse J, Nardino N. Formally verified superblock scheduling[C]//Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs. 2022: 40-54.
    [27] Yang Z, Shirako J, Sarkar V. Fully Verified Instruction Scheduling[J]. Proceedings of the ACM on Programming Languages, 2024, 8(OOPSLA2): 791-816.
    [28] Herklotz Y, Wickerson J. Hyperblock Scheduling for Verified High-Level Synthesis[J]. Proceedings of the ACM on Programming Languages, 2024, 8(PLDI): 1929-1953.
    [29] Zhou ZX, He H, Zhang YJ, Yang X, Sun YH. Two-dimensional force-directed cluster scheduling algorithm for the clustered VLIW architecture[J]. Journal of Tsinghua University (Science and Technology) 2008, 48(10): 1647-1650. (in Chinese with English abstract).
    [30] Desoli G. Instruction assignment for clustered VLIW DSP compilers: A new approach[M]. Hewlett Packard Laboratories, 1998.
    [31] Porpodas V, Cintra M. CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors[C]//2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). IEEE, 2013: 1-10.
    [32] Park J C H, Schlansker M. On predicated execution[M]. Palo Alto, California: Hewlett-Packard Laboratories, 1991.
    [33] Traber A, Zaruba F, Stucki S, Pullini A, Haugou G, Flamand E, Gürkaynak F K, Benini L. PULPino: A small single-core RISC-V SoC[C]//3rd RISC-V Workshop. 2016: 15.
    [34] riscv-collab. RISCV-GNU-Toolchain[EB/OL]. https://github.com/riscv-collab/riscv-gnu-toolchain. 2024-08-09.
    [35] llvm. The LLVM Compiler Infrastructure[EB/OL]. https://github.com/llvm/llvm-project. 2024-08-09.
    [36] riscv-software-src. Spike RISC-V ISA Simulator[EB/OL]. https://github.com/riscv-software-src/riscv-isa-sim. 2024-08-09.
    [37] Bellard F. QEMU, a fast and portable dynamic translator[C]//USENIX annual technical conference, FREENIX Track. 2005, 41(46): 10-5555.
    [38] Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, ... & Wood D A. The gem5 simulator[J]. ACM SIGARCH computer architecture news, 2011, 39(2): 1-7.
    [39] eembc. CoreMark is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded microcrontrollers (MCU) [EB/OL]. https://github.com/eembc/coremark. 2024-08-09.
    [40] Zha YQ. Research on Software-Hardware Co-Optimization Based on VLIW Architecture[D]. University of Chinese Academy of Science, 2024. (in Chinese with English abstract).
    [41] Dynmi. Implementation of AlexNet with C[EB/OL]. https://github.com/Dynmi/AlexNet. 2024-10-29.
    附中文参考文献:
    [2] 包云岗, 孙凝晖. 开源芯片生态技术体系构建面临的机遇与挑战. 中国科学院院刊, 2022, 37(1): 24-29.
    [4] 李晓霖, 韩萌, 郝凯, 薛海韵, 卢圣健, 张昆明, 祁楠, 牛星茂, 肖利民, 郝沁汾. 面向100 Gbps网络应用的RISC-V CPU设计与实现[J]. 计算机辅助设计与图形学学报, 2021, 33(6): 956-962.
    [5] 胡振波. 手把手教你设计CPU. RISC-V处理器篇[M]. 人民邮电出版社, 2018.
    [10] 王向前, 洪一, 王昊, 郑启龙. 魂芯DSP的编译器设计与优化[J]. 电子学报, 2015, 43(8): 1656-1661.
    [16] 黄磊, 冯晓兵. 结合的指令调度与寄存器分配技术[J]. 计算机应用研究, 2008(979-982).
    [29] 周志雄, 何虎, 张延军, 杨旭, 孙义和. 一种用于分簇 VLIW 结构的二维力量引导簇调度算法[J]. 清华大学学报: 自然科学版, 2008, 48(10): 1647-1650.
    [40] 查永权. 基于VLIW架构的软硬件协同优化研究[D]. 中国科学院大学, 2024.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李奕瑾,杜绍敏,赵家程,王雪莹,查永权,崔慧敏.基于RISC-V VLIW架构的混合指令调度算法.软件学报,2025,36(9):0

Copy
Share
Article Metrics
  • Abstract:193
  • PDF: 172
  • HTML: 0
  • Cited by: 0
History
  • Received:August 23,2024
  • Revised:November 20,2024
  • Online: December 10,2024
You are the first2034776Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063