S3ML: Secure Serving System for Machine Learning Inference

doi:10.13328/j.cnki.jos.006389

微信服务号

微信订阅号

Home > Archive>Volume 33, Issue 9, 2022 >3312-3330. DOI:10.13328/j.cnki.jos.006389

PDF HTML XML Export Cite reminder

S3ML: Secure Serving System for Machine Learning Inference
DOI:
                        10.13328/j.cnki.jos.006389
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

As the privacy-preserving problem gains increasing concerns in today's machine learning (ML) world, constructing an ML serving system with a data security guarantee becomes very important. Meanwhile, trusted execution environments (e.g., Intel SGX) have been widely used for developing trusted applications and systems. For instance, Intel SGX offers developers hardware-based secure containers (i.e., enclaves) to guarantee application confidentiality and integrity. This paper presents S3ML, an SGX-based secure serving system for ML inference. S3ML leverages Intel SGX to host ML models for users' privacy protection. To build a practical secure serving system, S3ML addresses several challenges to run model servers inside SGX enclaves. In order to ensure availability and scalability, a frontend ML inference service typically consists of many backend model server instances. When these instances are running inside SGX enclaves, new system architectures and protocols are in need to synchronize cryptographic certificates and keys to construct distributed secure enclave clusters. A dedicated module is designed, it is called attestation-based enclave configuration service in S3ML, responsible for generating, persisting, and distributing certificates and keys among clients and model server instances. The existing infrastructure can then be reused to do transparent load balancing and failover to ensure service high-availability and scalability. Besides, SGX enclaves rely on a special memory region called the enclave page cache (EPC), which has a limited size and is contended by a host’s all enclaves. Therefore, the performance of SGX-based applications is vulnerable to EPC interferences. To satisfy the service-level objective (SLO) of ML inference services, S3ML first integrates lightweight ML framework/models to reduce EPC consumption. Furthermore, through offline analysis, it is found feasible to use EPC paging throughput as indirect monitoring metric to satisfy SLO. Based on this result, S3ML uses real-time EPC paging information to control service load balancing and scaling activities for SLO satisfaction. S3ML has been implemented based on Kubernetes, TensorFlow Lite, and Occlum. The system overhead, feasibility, and effectiveness of S3ML are demonstrated through extensive experiments on a series of popular ML models.

Reference

Cited by

Get Citation

马俊明,吴秉哲,余超凡,周爱辉,巫锡斌,陈向群. S3ML: 一种安全的机器学习推理服务系统.软件学报,2022,33(9):3312-3330

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:March 23,2021
Revised:April 29,2021
Adopted:
Online: June 15,2022
Published: September 06,2022

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History