S3ML: Secure Serving System for Machine Learning Inference
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As the privacy-preserving problem gains increasing concerns in today's machine learning (ML) world, constructing an ML serving system with a data security guarantee becomes very important. Meanwhile, trusted execution environments (e.g., Intel SGX) have been widely used for developing trusted applications and systems. For instance, Intel SGX offers developers hardware-based secure containers (i.e., enclaves) to guarantee application confidentiality and integrity. This paper presents S3ML, an SGX-based secure serving system for ML inference. S3ML leverages Intel SGX to host ML models for users' privacy protection. To build a practical secure serving system, S3ML addresses several challenges to run model servers inside SGX enclaves. In order to ensure availability and scalability, a frontend ML inference service typically consists of many backend model server instances. When these instances are running inside SGX enclaves, new system architectures and protocols are in need to synchronize cryptographic certificates and keys to construct distributed secure enclave clusters. A dedicated module is designed, it is called attestation-based enclave configuration service in S3ML, responsible for generating, persisting, and distributing certificates and keys among clients and model server instances. The existing infrastructure can then be reused to do transparent load balancing and failover to ensure service high-availability and scalability. Besides, SGX enclaves rely on a special memory region called the enclave page cache (EPC), which has a limited size and is contended by a host’s all enclaves. Therefore, the performance of SGX-based applications is vulnerable to EPC interferences. To satisfy the service-level objective (SLO) of ML inference services, S3ML first integrates lightweight ML framework/models to reduce EPC consumption. Furthermore, through offline analysis, it is found feasible to use EPC paging throughput as indirect monitoring metric to satisfy SLO. Based on this result, S3ML uses real-time EPC paging information to control service load balancing and scaling activities for SLO satisfaction. S3ML has been implemented based on Kubernetes, TensorFlow Lite, and Occlum. The system overhead, feasibility, and effectiveness of S3ML are demonstrated through extensive experiments on a series of popular ML models.

    Reference
    Related
    Cited by
Get Citation

马俊明,吴秉哲,余超凡,周爱辉,巫锡斌,陈向群. S3ML: 一种安全的机器学习推理服务系统.软件学报,2022,33(9):3312-3330

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 23,2021
  • Revised:April 29,2021
  • Adopted:
  • Online: June 15,2022
  • Published: September 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063