基于PKS硬件特性的eBPF内存隔离机制
作者:
作者简介:

李浩(1999-),男,硕士生,主要研究领域为操作系统架构与安全;古金宇(1994-),男,博士,助理研究员,CCF专业会员,主要研究领域为操作系统,系统安全;夏虞斌(1982-),男,博士,副教授,博士生导师,CCF高级会员,主要研究领域为计算机体系结构,操作系统,虚拟化,系统安全;臧斌宇(1965-),男,博士,教授,博士生导师,CCF会士,主要研究领域为操作系统,计算机体系结构;陈海波(1982-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为操作系统,并行与分布式系统,虚拟化,系统安全

通讯作者:

古金宇,E-mail:gujinyu@sjtu.edu.cn

中图分类号:

TP306

基金项目:

国家杰出青年科学基金(61925206); 华为创新计划


Memory Isolation Mechanism of eBPF Based on PKS Hardware Feature
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [54]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    Linux内核中的eBPF (extended Berkeley packet filter)机制可以将用户提供的不受信任的程序安全地加载到内核中. 在eBPF机制中, 检查器负责检查并保证用户提供的程序不会导致内核崩溃或者恶意地访问内核地址空间. 近年来, eBPF机制得到了快速发展, 随着加入越来越多的新功能, 其检查器也变得愈发复杂. 观察到复杂的eBPF安全检查器存在的两个问题: 一是“假阴性”问题: 检查器复杂的安全检查逻辑中存在诸多漏洞, 而攻击者可以利用这些漏洞设计能够通过检查的恶意eBPF程序来攻击内核; 二是“假阳性”问题: 检查器采用静态检查的方式, 由于缺乏运行时信息只能进行保守检查, 可能造成原本安全的程序无法通过检查, 也只能支持很受限的语义, 为eBPF程序的开发带来了困难. 通过进一步分析, 发现eBPF检查器中的静态模拟执行检查机制代码量大, 复杂度高, 分析保守, 是引起安全漏洞和误报的主要原因. 因此, 提出使用轻量级动态检查的方式取代eBPF检查器中的静态模拟执行检查机制, eBPF检查器中原本由于模拟执行而存在的漏洞与保守检查不复存在, 从而能够消除诸多上述的“假阴性”和“假阳性”问题. 具体来说, 将eBPF程序运行在内核态沙箱中, 由沙箱对程序运行时的内存访问进行动态检查, 保证程序无法对内核内存进行非法访问; 为高效实现轻量化的内核态沙箱, 利用新型硬件特性Intel PKS (protection keys for supervisor)进行零开销的访存指令检查, 并提出高效的内核与沙箱中eBPF程序交互方法. 评测结果表明, 所提方法能够消除内核eBPF检查器中的内存安全漏洞(自2020年以来该类型漏洞在eBPF检查器的总漏洞中占比超过60%); 即使在吞吐量较高的网络包处理场景下, 轻量化内核沙箱带来的性能开销低于3%.

    Abstract:

    The extended Berkeley packet filter (eBPF) mechanism in the Linux kernel can safely load user-provided untrusted programs into the kernel. In the eBPF mechanism, the verifier checks these programs and ensures that they will not cause the kernel to crash or access the kernel address space maliciously. In recent years, the eBPF mechanism has developed rapidly, and its verifier has become more complex as more and more new features are added. This study observes two problems of the complex eBPF verifier. One is the “false negative” problem: There are many bugs in the complex security check logic of the verifier, and attackers can leverage these bugs to design malicious eBPF programs that can pass the verifier to attack the kernel. The other is the “false positive” problem: Since the verifier adopts the static check method, only conservative checks can be performed due to the lack of runtime information. This may cause the originally safe program to fail the check of the verifier and only support limited semantics, which brings difficulties to the development of eBPF programs. Further analysis shows that the static simulation execution check mechanism in the eBPF verifier features massive codes, high complexity, and conservative analysis, which are the main reasons for security vulnerabilities and false positives. Therefore, this study proposes to replace the static simulation execution check mechanism in the eBPF verifier with a lightweight dynamic check method. The bugs and conservative checks that originally existed in the eBPF verifier due to simulation execution no longer exist, and hence, the above-mentioned “false negative” and “false positive” problems can be eliminated. Specifically, the eBPF program is run in a kernel sandbox, which dynamically checks the memory access of the program in the runtime to prevent it from accessing the kernel memory illegally. For efficient implementation of a lightweight kernel sandbox, the Intel protection keys for supervisor (PKS), a new hardware feature, is used to perform a zero-overhead memory access check, and an efficient interaction method between the kernel and the eBPF program in the sandbox is presented. The evaluation results show that this study can eliminate memory security vulnerabilities of the eBPF verifier (this type of vulnerability has accounted for more than 60% of the total vulnerabilities of the eBPF verifier since 2020). Moreover, in the scenario of high-throughput network packet processing, the performance overhead brought by the lightweight kernel sandbox is lower than 3%.

    参考文献
    [1] Calavera D, Fontana L. Linux Observability with BPF: Advanced Programming for Performance Analysis and Networking. Sebastopol: O’Reilly Media, 2019.
    [2] Gershuni E, Amit N, Gurfinkel A, Narodytska N, Navas JA, Rinetzky N, Ryzhyk L, Sagiv M. Simple and precise static analysis of untrusted Linux kernel extensions. In: Proc. of the 40th ACM SIGPLAN Conf. on Programming Language Design and Implementation. Phoenix: ACM, 2019. 1069–1084.
    [3] CVE: Search Result. 2022. https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=BPF
    [4] CVE-2022–23222. 2022. https://nvd.nist.gov/vuln/detail/CVE-2022-23222
    [5] Nelson L, Wang X, Torlak E. A proof-carrying approach to building correct and flexible in-kernel verifiers. Technical Report, Linux Plumbers Conf. 2021.
    [6] Guide P. Intel® 64 and IA-32 architectures software developer’s manual. Volume 3B: System programming Guide, Part. 2011. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    [7] McCanne S, Jacobson V. The BSD packet filter: A new architecture for user-level packet capture. In: Proc. of the 1993 USENIX Winter Conf. San Diego: ACM, 1993. 2.
    [8] Cilium: Linux Native, API-Aware Networking and Security for Containers. 2022. https://cilium.io/
    [9] BCC: IO Visor Project. 2022. https://www.iovisor.org/technology/bcc
    [10] bpftrace: High-level tracing language for Linux systems. 2022. https://bpftrace.org/
    [11] The Falco Project. 2022. https://falco.org/
    [12] The Katran Project. 2022. https://github.com/facebookincubator/katran
    [13] A seccomp overview. 2022. https://lwn.net/Articles/656307/
    [14] The extended Berkeley Packet Filter (eBPF) backend. 2022. http://llvm.org/docs/CodeGenerator.html#the-extended-berkeley-packet-filter-ebpf-backend
    [15] eBPF maps. 2022. https://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html
    [16] Burow N, Zhang XP, Payer M. SoK: Shining light on shadow stacks. In: Proc. of the 2019 IEEE Symp. on Security and Privacy (SP). San Francisco: IEEE, 2019. 985–999.
    [17] Hedayati M, Gravani S, Johnson E, Criswell J, Scott ML, Shen K, Marty M. Hodor: Intra-process isolation for high-throughput data plane libraries. In: Proc. of the 2019 USENIX Conf. on USENIX Annual Technical Conf. Renton: ACM, 2019. 489–504.
    [18] Kjellqvist C, Hedayati M, Scott ML. Safe, fast sharing of memcached as a protected library. In: Proc. of the 49th Int’l Conf. on Parallel Processing-ICPP. Edmonton: ACM, 2020. 1–8.
    [19] Vahldiek-Oberwagner A, Elnikety E, Duarte NO, Sammler M, Druschel P, Garg D. ERIM: Secure, efficient in-process isolation with protection keys (MPK). In: Proc. of the 28th USENIX Conf. Security Symp. Santa Clara: ACM, 2019. 1221–1238.
    [20] PKS: Add Protection Keys Supervisor (PKS) support. 2022. https://lwn.net/Articles/826091/
    [21] Sehr D, Muth R, Biffle C, Khimenko V, Pasko E, Schimpf K, Yee B, Chen B. Adapting software fault isolation to contemporary CPU architectures. In: Proc. of the 19th USENIX Conf. on Security. Washington: ACM, 2010. 1.
    [22] Wahbe R, Lucco S, Anderson TE, Graham SL. Efficient software-based fault isolation. In: Proc. of the 14th ACM Symp. on Operating Systems Principles. Asheville: ACM, 1993. 203–216.
    [23] 张一帆, 黄超, 欧建生, 汤恩义, 陈鑫. 设备驱动程序可靠性和正确性保障方法与技术研究进展. 软件学报, 2015, 26(2): 239-253. http://www.jos.org.cn/1000-9825/4778.htm
    Zhang YF, Huang C, Ou JS, Tang EY, Chen X. Research on reliability and correctness assurance methods and techniques for device drivers. Ruan Jian Xue Bao/Journal of Software, 2015, 26(2): 239-253 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4778.htm
    [24] Sharif MI, Lee W, Cui WD, Lanzi A. Secure in-VM monitoring using hardware virtualization. In: Proc. of the 16th ACM Conf. on Computer and Communications Security. Chicago: ACM, 2009. 477–487.
    [25] 钟炳南, 邓良, 曾庆凯. 基于硬件虚拟化的内核同层多域隔离模型. 软件学报, 2022, 33(2): 473–497. http://www.jos.org.cn/1000-9825/6211.htm
    Zhong BN, Deng L, Zeng QK. Kernel-level multi-domain isolation model based on hardware virtualization. Ruan Jian Xue Bao/Journal of Software, 2022, 33(2): 473–497 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6211.htm
    [26] 余劲, 黄皓, 诸渝, 许封元. DBox: 宏内核下各种设备驱动程序的高性能安全盒. 计算机学报, 2020, 43(4): 724–739. [doi: 10.11897/SP.J.1016.2020.00724]
    Yu J, Huang H, Zhu Y, Xu FY. Dbox: High-performance secure boxes for various device drivers of monolithic kernels. Chinese Journal of Computers, 2020, 43(4): 724–739 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2020.00724]
    [27] Criswell J, Dautenhahn N, Adve V. KCoFI: Complete control-flow integrity for commodity operating system kernels. In: Proc. of the 2014 IEEE Symp. on Security and Privacy. Berkeley: IEEE, 2014. 292–307.
    [28] Dong XW, Shen ZJ, Criswell J, Cox AL, Dwarkadas S. Shielding software from privileged side-channel attacks. In: Proc. of the 27th USENIX Conf. on Security Symp. Baltimore: ACM, 2018. 1441–1458.
    [29] Pomonis M, Petsios T, Keromytis AD, Polychronakis M, Kemerlis VP. Kernel protection against just-in-time code reuse. ACM Transactions on Privacy and Security, 2019, 22(1): 5. [doi: 10.1145/3277592]
    [30] Gu JY, Wu XY, Li WT, Liu N, Mi ZY, Xia YB, Chen HB. Harmonizing performance and isolation in microkernels with efficient intra-kernel isolation and communication. In: Proc. of the 2020 USENIX Annual Technical Conf. ACM, 2020. 401–417.
    [31] Gravani S, Hedayati M, Criswell J, Scott ML. Fast Intra-kernel Isolation and Security with IskiOS. In: Proc. of the 24th Int’l Symp. on Research in Attacks, Intrusions and Defenses. San Sebastian: ACM, 2021. 119–134.
    [32] Bounded loops in BPF for the 5.3 kernel. 2019. https://lwn.net/Articles/794934/
    [33] Xu QW, Wong MD, Wagle T, Narayana S, Sivaraman A. Synthesizing safe and efficient kernel extensions for packet processing. In: Proc. of the 2021 ACM SIGCOMM Conf. ACM, 2021. 50–64.
    [34] CVE-2021–45402. 2022. https://nvd.nist.gov/vuln/detail/CVE-2021-45402
    [35] CVE-2021–3490. 2022. https://nvd.nist.gov/vuln/detail/CVE-2021-3490
    [36] Yee B, Sehr D, Dardyk G, Chen JB, Muth R, Ormandy T, Okasaka S, Narula N, Fullagar N. Native client: A sandbox for portable, untrusted x86 native code. In: Proc. of the 30th IEEE Symp. on Security and Privacy. Oakland: IEEE, 2009. 79–93.
    [37] Koning K, Chen X, Bos H, Giuffrida C, Athanasopoulos E. No need to hide: Protecting safe regions on commodity hardware. In: Proc. of the 20th European Conf. on Computer Systems. Belgrade: ACM, 2017. 437–452.
    [38] Hsu TCH, Hoffman K, Eugster P, Payer M. Enforcing least privilege memory views for multithreaded applications. In: Proc. of the 2016 ACM SIGSAC Conf. on Computer and Communications Security. Vienna: ACM, 2016. 393–405.
    [39] Litton J, Vahldiek-Oberwagner A, Elnikety E, Garg D, Bhattacharjee B, Druschel P. Light-weight contexts: An OS abstraction for safety and performance. In: Proc. of the 12th USENIX Conf. on Operating Systems Design and Implementation. Savannah: ACM, 2016. 49–64.
    [40] BPF: Allow extended BPF programs access skb fields. 2015. https://lwn.net/Articles/636647/
    [41] Park S, Lee S, Xu W, Moon H, Kim T. libmpk: Software abstraction for intel memory protection keys (intel MPK). In: Proc. of the 2019 USENIX Annual Technical Conf. Renton: USENIX Association, 2019. 241–254.
    [42] Xu YC, Ye CC, Solihin Y, Shen XP. Hardware-based domain virtualization for intra-process isolation of persistent memory objects. In: Proc. of the 47th ACM/IEEE Annual Int’l Symp. on Computer Architecture. Valencia: IEEE, 2020. 680–692.
    [43] Schrammel D, Weiser S, Steinegger S, Schwarzl M, Schwarz M, Mangard S, Gruss D. Donky: Domain keys—Efficient in-process isolation for RISC-V and x86. In: Proc. of the 29th USENIX Conf. on Security Symp. ACM, 2020. 1677–1694.
    [44] CVE-2021–20268. 2022. https://nvd.nist.gov/vuln/detail/CVE-2021-20268
    [45] Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proc. of the 1st ACM Symp. on Cloud Computing. Indianapolis: ACM, 2010. 143–154.
    [46] Mahadevan SV, Takano Y, Miyaji A. PRSafe: Primitive recursive function based domain specific language using LLVM. In: Proc. of the 2021 Int’l Conf. on Electronics, Information, and Communication. Jeju: IEEE, 2021. 1–4.
    [47] Gurfinkel A, Kahsai T, Komuravelli A, Navas JA. The SeaHorn verification framework. In: Proc. of the 27th Int’l Conf. on Computer Aided Verification. San Francisco: Springer, 2015. 343–361.
    [48] de Moura L, Kong S, Avigad J, van Doorn F, von Raumer J. The Lean theorem prover (system description). In: Proc. of the 25th Int’l Conf. on Automated Deduction. Berlin: Springer, 2015. 378–388.
    [49] Narayanan V, Huang YZ, Tan G, Jaeger T, Burtsev A. Lightweight kernel isolation with virtualization and VM functions. In: Proc. of the 16th ACM SIGPLAN/SIGOPS Int’l Conf. on Virtual Execution Environments. Lausanne: ACM, 2020. 157–171.
    [50] ARM developer suite developer guide. 2001. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0056d/BABBJAED.html
    [51] RISC-V. ISA specification. 2022. https://riscv.org/specifications/
    相似文献
    引证文献
引用本文

李浩,古金宇,夏虞斌,臧斌宇,陈海波.基于PKS硬件特性的eBPF内存隔离机制.软件学报,2023,34(12):5921-5939

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-17
  • 最后修改日期:2022-07-18
  • 在线发布日期: 2023-02-15
  • 出版日期: 2023-12-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号