Accelerator Virtualization Framework Based on Inter-VM Exitless Communication
Author:
Affiliation:

Fund Project:

Key-area Research and Development Program of Guangdong Province of China (2020B010164003); National Science Fund for Distinguished Young Scholars (61925206); HighTech Support Program from Shanghai Committee of Science and Technology (19511121100)

  • Article
  • | |
  • Metrics
  • |
  • Reference [46]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The increasing deployment of artificial intelligence has placed unprecedent requirements on the computing power of cloud computing. Cloud service providers have integrated accelerators with massive parallel computing units in the data center. These accelerators need to be combined with existing virtualization platforms to partition the computing resources. The current mainstream accelerator virtualization solution is through the PCI passthrough approach, which however does not support fine-grained resource provisioning. Some manufacturers also start to provide time-sliced multiplexing schemes, and use drivers to cooperate with specific hardware to divide resources and time slices to different virtual machines, which unfortunately suffer from poor portability and flexibility. One alternative another but promising approach is based on API forwarding, which forwards the virtual machine's request to the back-end driver for processing through a separate driver model. Yet, the communication due to API forwarding can easily become the performance bottleneck. This study proposes Wormhole, an accelerator virtualization framework based on the C/S architecture that supports rapid delegated execution across virtual machines. It aims to provide upper-level users with an efficient and transparent way to accelerate accelerator virtualization with API forwarding while ensuring strong isolation between multiple users. By leveraging hardware virtualization feature, the framework minimizes performance degradation through exitless cross-VM control flow switch. Experimental results show that Wormhole’s prototype system can achieve up to 5 times performance improvement over the classic open-source virtualization solution such as GVirtuS in the training test of the classic model.

    Reference
    [1] Zhang ZK, Pang WG, Xie WJ, Lü MS, Wang Y. A survey of deep learning research for real-time applications. Ruan Jian Xue Bao/Journal of Software, 2020,31(9):2654-2677(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5946.htm[doi:10.13328/j.cnki.jos.005946]
    [2] Jouppi NP, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. In:Proc. of the 44th Annual Int'l Symp. on Computer Architecture (ISCA 2017). New York:Association for Computing Machinery, 2017. 1-12.[doi:10.1145/3079856.3080246]
    [3] Zhang XL, Yang JH, Sun XQ, Wu JP. Survey of geo-distributed cloud research progress. Ruan Jian Xue Bao/Journal of Software, 2018,29(7):2116-2132(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5555.htm[doi:10.13328/j.cnki.jos. 005555]
    [4] Gao Q, Zhang FL, Wang RJ, Zhou F. Trajectory big data:A review of key technologies in data processing. Ruan Jian Xue Bao/Journal of Software, 2017,28(4):959-992(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5143.htm[doi:10.13328/j.cnki.jos.005143]
    [5] Intel platform hardware support for I/O virtualization. 2006. http://www.intel.com
    [6] Herrera A. NVIDIA GRID:Graphics accelerated VDI with the visual performance of a workstation. White Paper, NVIDIA Corp., 2014. 1-18.
    [7] Tian K, Dong YZ, Cowperthwaite D. A full GPU virtualization solution with mediated pass-through. In:Proc. of the 2014 USENIX Conf. on USENIX Annual Technical Conf. (USENIX ATC 2014). USENIX Association, 2014. 121-132.
    [8] GRID Virtual GPU User Guide. 2020. https://docs.nvidia.com/grid/4.3/grid-vgpu-user-guide/index.html
    [9] Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES. rCUDA:Reducing the number of GPU-based accelerators in high performance clusters. In:Proc. of the Int'l Conf. on High Performance Computing & Simulation. Caen, 2010. 224-231.[doi:10. 1109/HPCS.2010.5547126]
    [10] Montella R, Giunta G, Laccetti G, Lapegna M, Palmieri C, Ferraro C, Pelliccia V, Hong C-H, Spence I, Nikolopoulos DS. On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework. Int'l Journal of Parallel Programming, 2017,45(5):1142-1163.[doi:10.1007/s10766-016-0462-1]
    [11] Armand F, Gien M, Maigné G, Mardinian G. Shared device driver model for virtualized mobile handsets. In:Proc. of the 1st Workshop on Virtualization in Mobile Computing (MobiVirt 2008). New York:Association for Computing Machinery, 2008. 12-16.[doi:10.1145/1622103.1622104]
    [12] Zhang YQ, Wang XF, Liu XF, Liu L. Survey on cloud computing security. Ruan Jian Xue Bao/Journal of Software, 2016,27(6):1328-1348(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5004.htm[doi:10.13328/j.cnki.jos.005004]
    [13] Yu QQ, Dong MK, Chen HB. Memory-assisted synchronization mechanism for hardware transactions in a virtual environment. Ji Suan Ji Ke Xue Yu Tan Suo/Journal of Frontiers of Computer Science and Technology, 2017,11(9):1429-1438(in Chinese with English abstract).
    [14] Wu S, Wang K, Jin H. Research status and prospect of operating system virtualization. Ji Suan Ji Yan Jiu Yu Fa Zhan/Computer Technology and Development, 2019,56(1):58-68(in Chinese with English abstract).
    [15] Liu YT, Chen HB. Virtualization security:Opportunities, challenges and future. Wang Luo Yu Xin Xi An Quan Xue Bao/Chinese Journal of Network and Information Security, 2016,2(10):17-28(in Chinese with English abstract).
    [16] Huang X, Deng L, Sun H, Zeng QK. Hardware virtualization-based secure and efficient kernel monitoring model. Ruan Jian Xue Bao/Journal of Software, 2016,27(2):481-494(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4866.htm[doi:10.13328/j.cnki.jos.004866]
    [17] Intel 64 and ia-32 architectures software developer's manual volume 3c. https://software.intel.com/en-us/articles/intel-sdm
    [18] Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. In:Proc. of the 12th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS XⅡ). New York:Association for Computing Machinery, 2006. 2-13.[doi:10.1145/1168857.1168860]
    [19] Liu WJ, Wang LN, Tan C, Xu L. VMFUNC-based virtual machine introspection trigger mechanism. Ji Suan Ji Yan Jiu Yu Fa Zhan/Computer Technology and Development, 2017,54(10):2310-2320(in Chinese with English abstract).
    [20] Liu YT, Zhou TY, Chen KX, Chen HB, Xia YB. Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In:Proc. of the 22nd ACM SIGSAC Conf. on Computer and Communications Security (CCS 2015). New York:Association for Computing Machinery, 2015. 1607-1619.[doi:10.1145/2810103.2813690]
    [21] Mi ZY, Li DJ, Yang ZH, Wang XR, Chen HB. SkyBridge:Fast and secure inter-process communication for microkernels. In:Proc. of the 14th EuroSys Conf. 2019(EuroSys 2019). New York:Association for Computing Machinery, 2019. 1-15.[doi:10.1145/3302424.3303946]
    [22] Shi L, Chen H, Sun J, et al. vCUDA:GPU-accelerated high-performance computing in virtual machines. IEEE Trans. on Computers, 2012,61(6):804-816.[doi:10.1109/TC.2011.112]
    [23] Zhang HL, Fang BX, Hu MZ, Jiang Y, Zhan CY, Zhang SF. Survey of Internet measurement and analysis. Ruan Jian Xue Bao/Journal of Software, 2003,14(1):110-116(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/20030117.htm
    [24] Russell R. Virtio:Towards a de-facto standard for virtual I/O devices. SIGOPS Operating Systems Review, 2008,42(5):95-103.[doi:10.1145/1400097.1400108]
    [25] Kalia A, Kaminsky M, Andersen DG. Using RDMA efficiently for key-value services. In:Proc. of the 2014 ACM Conf. on SIGCOMM (SIGCOMM 2014). New York:Association for Computing Machinery, 2014. 295-306.[doi:10.1145/2619239. 2626299]
    [26] Zhang XT, Zheng X, Wang Z, Yang H, Shen YB, Long X. High-density multi-tenant bare-metal cloud. In:Proc. of the 25th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2020). New York:Association for Computing Machinery, 2020. 483-495.[doi:10.1145/3373376.3378507]
    [27] Nickolls J, Buck I, Garland M, Skadron K. Scalable parallel programming with CUDA. In:Proc. of the ACM SIGGRAPH 2008 Classes (SIGGRAPH 2008). New York:Association for Computing Machinery, 2008. 1-14.[doi:10.1145/1401132.1401152]
    [28] Bellard F. QEMU, a fast and portable dynamic translator. In:Proc. of the Annual Conf. on USENIX Annual Technical Conf. (ATEC 2005). USENIX Association, 2005. 41.
    [29] Jia YQ, Shelhamer E, Donahue J, et al. Caffe:Convolutional architecture for fast feature embedding. In:Proc. of the 22nd ACM Int'l Conf. on Multimedia (MM 2014). New York:Association for Computing Machinery, 2014. 675-678.[doi:10.1145/2647868. 2654889]
    [30] Le Cun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015,521:436-444. https://doi.org/10.1038/nature14539
    [31] Forsyth DA, Ponce J. Computer Vision:A Modern Approach. Prentice Hall Professional Technical Reference, 2012.
    [32] Weikum G. Foundations of statistical natural language processing. SIGMOD Record, 2002,31(3):37-38.[doi:10.1145/601858. 601867]
    [33] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017,60(6):84-90.[doi:10.1145/3065386]
    [34] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc. of the IEEE, 1998,86(11):2278-2324.[doi:10.1109/5.726791]
    [35] Deng L. The MNIST database of handwritten digit images for machine learning research[Best of the Web]. IEEE Signal Processing Magazine, 2012,29(6):141-142.[doi:10.1109/MSP.2012.2211477]
    附中文参考文献:
    [1] 张政馗,庞为光,谢文静,吕鸣松,王义.面向实时应用的深度学习研究综述.软件学报,2020,31(9):2654-2677. http://www.jos.org.cn/1000-9825/5946.htm[doi:10.13328/j.cnki.jos.005946]
    [3] 张晓丽,杨家海,孙晓晴,吴建平.分布式云的研究进展综述.软件学报,2018,29(7):2116-2132. http://www.jos.org.cn/1000-9825/5555.htm[doi:10.13328/j.cnki.jos.005555]
    [4] 高强,张凤荔,王瑞锦,周帆.轨迹大数据:数据处理关键技术研究综述.软件学报,2017,28(4):959-992. http://www.jos.org.cn/1000-9825/5143.htm[doi:10.13328/j.cnki.jos.005143]
    [12] 张玉清,王晓菲,刘雪峰,刘玲.云计算环境安全综述.软件学报,2016,27(6):1328-1348. http://www.jos.org.cn/1000-9825/5004.htm[doi:10.13328/j.cnki.jos.005004]
    [13] 余倩倩,董明凯,陈海波.虚拟环境下硬件事务内存辅助的同步机制.计算机科学与探索,2017,11(9):1429-1438.
    [14] 吴松,王坤,金海.操作系统虚拟化的研究现状与展望.计算机研究与发展,2019,56(1):58-68.
    [15] 刘宇涛,陈海波.虚拟化安全:机遇,挑战与未来.网络与信息安全学报,2016,2(10):17-28.
    [16] 黄啸,邓良,孙浩,曾庆凯.基于硬件虚拟化的安全高效内核监控模型.软件学报,2016,27(2):481-494. http://www.jos.org.cn/1000-9825/4866.htm[doi:10.13328/j.cnki.jos.004866]
    [19] 刘维杰,王丽娜,谈诚,徐来.基于VMFUNC的虚拟机自省触发机制.计算机研究与发展,2017,54(10):2310-2320.
    [23] 张宏莉,方滨兴,胡铭曾,姜誉,詹春艳,张树峰.Internet测量与分析综述.软件学报,2003,14(1):110-116. http://www.jos.org.cn/1000-9825/20030117.htm
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李鼎基,糜泽羽,吴保东,陈逊,赵永望,丁佐华,陈海波.基于跨虚拟机零下陷通信的加速器虚拟化框架.软件学报,2020,31(10):3019-3037

Copy
Share
Article Metrics
  • Abstract:3298
  • PDF: 6311
  • HTML: 3703
  • Cited by: 0
History
  • Received:February 10,2020
  • Revised:April 04,2020
  • Online: June 11,2020
  • Published: October 06,2020
You are the first2033161Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063