RISC-V架构下的懒惰影子页表模型
作者:
通讯作者:

罗英伟,E-mail:lyw@pku.edu.cn

基金项目:

国家重点研发计划项目(2022YFB4500701);国家自然科学基金项目(62032008,62032001,62372011)


Lazy Shadow Paging Under the RISC-V Architecture
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [33]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    内存虚拟化作为虚拟化技术的核心组成部分,直接影响虚拟机的整体性能.目前,主流的内存虚拟化方法在两维地址翻译开销与页表同步开销之间面临权衡.传统的影子页表模型通过一套由软件维护的页表,实现了与原生(Native)环境相当的地址翻译性能.然而,由于影子页表的同步依赖于基于写保护的机制,频繁的虚拟机退出(VM-exit)会严重影响系统性能.相对而言,嵌套页表模型依靠硬件辅助虚拟化,允许虚拟机的客户页表直接加载到内存管理单元中,从而避免了页表同步的开销.然而,这种方法的两维页表遍历却显著降低了地址翻译效率.基于RISC-V架构下的特权级模型和虚拟化硬件特性,本文提出了一种懒惰影子页表模型LSP(Lazy Shadow Paging),在保留影子页表的地址翻译高效性的同时降低了页表同步开销.懒惰影子页表模型深入分析了客户机对页表页的访问模式,将页表同步与转址旁路缓存(Translation Lookaside Buffer, TLB)刷新操作绑定以降低虚拟机退出的数量.然后,利用RISC-V架构中对TLB的细粒度刷新且可拦截的特性,无效化需同步的影子页表项,将页表同步的软件开销推迟到了首次访问该页面的时刻.此外,懒惰影子页表模型利用RISC-V架构中全新的特权级模型,设计了TLB拦截的快速路径,进一步减少了虚拟机退出带来的软件开销.实验表明,在基础RISC-V架构下,懒惰影子页表相对于传统影子页表在微基准测试中降低了最多50%的虚拟机退出数量.在支持RISC-V的虚拟化扩展架构下,懒惰影子页表对SPEC06基准测试中的典型应用相较于传统影子页表降低了最多25%的虚拟机退出数量,并且相较于嵌套页表每次TLB缺失降低了12次访存.

    Abstract:

    As a key component of virtualization technology, memory virtualization directly affects the performance of virtual machines. However, the current memory virtualization methods always tradeoff between the overhead of two-dimensional address translation and the overhead of page table synchronization. The traditional shadow paging method uses an extra page table maintained by software to achieve the native address translation performance. But the synchronization of shadow page table based on write protection always causes VM-exits, which seriously decreases the performance. The nested paging method uses hardware-assisted virtualization, and the process page table of applications and the nested page table of the VM can be directly loaded into the MMU, thus avoiding the overhead of page table synchronization, but the two-dimensional page table traversal will seriously degrade the address translation performance. Based on the privilege model and hardware features under RISC-V architecture, this paper present Lazy Shadow Paging(LSP), which reduces the overhead of page table synchronization while maintaining the efficiency of address translation of shadow page tables. The lazy shadow paging first analyzes the access characteristics of process page table pages by guest OS, and combines the synchronization with the TLB flush. It then delays the synchronization software overhead to the first visit after that. At the same time, lazy shadow paging designs a fast path for VM-exits based on the privilege level model under RISC-V. Experiments show that under the basic RISC-V architecture, the lazy shadow paging is reduced 50% of the VM-exits compared with the traditional shadow paging in the micro-benchmark. For the typical application in SPEC2006 benchmark, the lazy shadow paging reduces the number of VM-exits by up to 25% compared with the traditional shadow paging, and reduces 12 memory accesses per TLB miss compared with the nested paging.

    参考文献
    [1] Bhardwaj S, Jain L, Jain S. Cloud computing: A study of infrastructure as a service (IAAS). Int’l Journal of engineering and information Technology, 2010,2(1):60-63.
    [2] Bhargava R, Serebrin B, Spadini F, Manne S. Accelerating two-dimensional page walks for virtualized systems. In: Proc. of the 13th Int’l Conf. on Architectural support for programming languages and operating systems. New York: Association for Computing Machinery, 2008. 26-35. [doi: 10.1145/1346281.1346286]
    [3] Cervone HF. An overview of virtual and cloud computing. OCLC Systems & Services: Int’l digital library perspectives, 2010,26(3):162-165. [doi: 10.1108/10650751011073607]
    [4] Li Y, Lin Y, Wang Y, Ye K, Xu C. Serverless computing: state-of-the-art, challenges and opportunities. IEEE Trans. on Services Computing. 2022,16(2):1522-1539. [doi: 10.1109/TSC.2022.3166553]
    [5] Li Z, Guo L, Cheng J, Chen Q, He B, Guo M. The serverless computing survey: A technical primer for design architecture. ACM Computing Surveys (CSUR). 2022,54(10s):1-34. [doi: 10.1145/3508360]
    [6] Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. ACM Sigplan Notices, 2006,41(11):2-13. [doi: 10.1145/1168918.1168860]
    [7] Ryoo JH, Gulur N, Song S, John LK. Rethinking TLB designs in virtualized environments: A very large part-of-memory TLB. ACM SIGARCH Computer Architecture News. 2017,45(2):469-480. [doi: 10.1145/3140659.3080210]
    [8] Gandhi J, Basu A, Hill MD, Swift MM. Efficient memory virtualization: Reducing dimensionality of nested page walks. In: Proc. of the 47th Annual IEEE/ACM Int’l Symp. on Microarchitecture. New York: Institute of Electrical and Electronics Engineers, 2014. 178-189. [doi: 10.1109/MICRO.2014.37]
    [9] Wang X, Zang J, Wang Z, Luo Y, Li X. Selective hardware/software memory virtualization. ACM SIGPLAN Notices, 2011,46(7):217-226. [doi: 10.1145/2007477.1952710]
    [10] Waldspurger CA. Memory resource management in VMware ESX server. ACM SIGOPS Operating Systems Review, 2022,36(SI):181-194. [doi: 10.1145/844128.844146]
    [11] Gandhi J, Hill MD, Swift MM. Agile paging: Exceeding the best of nested and shadow paging. ACM SIGARCH Computer Architecture News, 2016,44(3):707-718. [doi: 10.1145/3007787.3001212]
    [12] Dörflinger A, Albers M, Kleinbeck B, Guan Y, Michalik H, Klink R, Blochwitz C, Nechi A, Berekovic M. A comparative survey of open-source application-class RISC-V processor implementations. In: Proc. of the 18th ACM Int’l Conf. on computing frontiers. New York: Association for Computing Machinery, 2021. 12-20. [doi: 10.1145/3457388.3458657]
    [13] Sha S, Zhang Y, Luo Y, Wang X, Wang Z. Swift shadow paging (SSP): No write-protection but following TLB flushing. In Proc. of the 17th ACM SIGPLAN/SIGOPS Int’l Conf. on Virtual Execution Environments. New York: Association for Computing Machinery, 2021. 29-42. [doi: 10.1145/3453933.3454012]
    [14] Sha S, Zhang Y, Luo Y, Wang X, Wang Z. Accelerating address translation for virtualization by leveraging hardware mode. IEEE Trans. on Computers, 2022,71(11):3047-3060. [10.1109/TC.2022.3145671]
    [15] Ahn J, Jin S, Huh J. Revisiting hardware-assisted page walks for virtualized systems. ACM SIGARCH Computer Architecture News, 2012,40(3):476-487. [10.1145/2366231.2337214]
    [16] Sha S, Du H, Luo Y, Wang X, Wang Z. Software-Based Flat Nested Page Table in Sunway Architecture. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022,59(4):737-746 (in Chinese with English abstract). https://crad.ict.ac.cn/cn/article/doi/10.7544/issn1000-1239.20210140 [doi: 10.7544/issn1000-1239.20210140]
    [17] Park CH, Vougioukas I, Sandberg A, Black-Schaffer D. Every walk’s a hit: making page walks single-access cache hits. In Proc. of the 27th ACM Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. New York: Association for Computing Machinery, 2022. 128-141. [doi: 10.1145/3503222.3507718]
    [18] Liang Z, Li T, Cui E. RISC-V Virtualization: Exploring Virtualization in an Open Instruction Set Architecture. In Proc. of the 2024 5th Int’l Conf. on Computing, Networks and Internet of Things. New York: Association for Computing Machinery, 2024. 473-477. [doi: 10.1145/3670105.3670188]
    [19] Sá B, Martins J, Pinto S. A first look at RISC-V virtualization from an embedded systems perspective. IEEE Trans. on Computers, 2021,71(9):2177-2190. [doi: 10.1109/TC.2021.3124320]
    [20] Patel A, Daftedar M, Shalan M, El-Kharashi MW. Embedded hypervisor xvisor: A comparative analysis. In 2015 23rd Euromicro Int’l Conf. on Parallel, Distributed, and Network-Based Processing. New York: Institute of Electrical and Electronics Engineers, 2015. 682-691. [doi: 10.1109/PDP.2015.108]
    [21] Lim JT, Dall C, Li SW, Nieh J, Zyngier M. NEVE: Nested virtualization extensions for ARM. In Proc. of the 26th Symp. on Operating Systems Principles. New York: Association for Computing Machinery, 2017, 201-217. [doi: 10.1145/3132747.3132754]
    [22] Merrifield T, Taheri HR. Performance implications of extended page tables on virtualized x86 processors. In Proc. of the 12th ACM SIGPLAN/SIGOPS Int’l Conf. on Virtual Execution Environments. New York: Association for Computing Machinery, 2016. 25-35. [doi: 10.1145/2892242.2892258]
    [23] Cui E, Li T, Wei Q. Risc-v instruction set architecture extensions: A survey. IEEE Access, 2023, 11(1): 24696-24711. [doi:10.1109/ACCESS.2023.3246491]
    [24] Domingos JM, Rocha T, Neves N, Roma N, Tomás P, Sousa L. Supporting RISC-V performance counters through Linux performance analysis tools. In 2023 IEEE 34th Int’l Conf. on Application-specific Systems, Architectures and Processors (ASAP). New York: Institute of Electrical and Electronics Engineers, 2023. 94-101. [doi: 10.1109/ASAP57973.2023.00027]
    [25] Pham B, Veselý J, Loh GH, Bhattacharjee A. Large pages and lightweight memory management in virtualized environments: Can you have it both ways?. In Proc. of the 48th Int’l Symp. on Microarchitecture. New York: Association for Computing Machinery, 2015. 1-12. [doi: 10.1145/2830772.2830773]
    [26] Kwon O, Lee Y, Hong S. Virtual PTE Storage: Repurposing Last-level Cache to Accelerate Address Translation for Big Data Workloads. In 2022 IEEE Int’l Conf. on Consumer Electronics-Asia (ICCE-Asia). New York: Institute of Electrical and Electronics Engineers, 2022. 1-5. [doi: 10.1109/ICCE-Asia57006.2022.9954665]
    [27] Barr TW, Cox AL, Rixner S. Translation caching: skip, don't walk (the page table). ACM SIGARCH Computer Architecture News, 2010,38(3):48-59. [doi: 10.1145/1816038.1815970]
    [28] Bhattacharjee A. Large-reach memory management unit caches. In Proc. of the 46th Annual IEEE/ACM Int’l Symp. on Microarchitecture. New York: Association for Computing Machinery, 2013. 383-394. [doi: 10.1145/2540708.2540741]
    [29] Brown N, Jamieson M. Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC. arXiv preprint arXiv:2406.12394, 2024. [doi: 10.48550/arXiv.2406.12394]
    [30] Bellard F. QEMU, a fast and portable dynamic translator. In Proc. of the annual conference on USENIX Annual Technical Conference. New York: Association for Computing Machinery, 2005. 41-46. [doi: 10.5555/1247360.1247401]
    [31] Henning JL. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News. 2006,34(4):1-7. [doi: 10.1145/1186736.1186737]
    附中文参考文献:
    [16] 沙赛,杜翰霖,罗英伟,汪小林,王振林.申威架构下的软件平滑嵌套页表.计算机研究与发展,2022,59(4):737-746. https://crad.ict.ac.cn/cn/article/doi/10.7544/issn1000-1239.20210140 [doi: 10.7544/issn1000-1239.20210140]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李传东,衣然,罗英伟,汪小林,王振林. RISC-V架构下的懒惰影子页表模型.软件学报,2025,36(9):0

复制
分享
文章指标
  • 点击次数:83
  • 下载次数: 128
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2024-08-25
  • 最后修改日期:2024-10-15
  • 在线发布日期: 2024-12-10
文章二维码
您是第19788010位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号