基于无服务器计算的多方数据库安全计算系统
作者:
通讯作者:

江佳伟, E-mail: jiawei.jiang@whu.edu.cn

中图分类号:

TP311

基金项目:

国家重点研发计划(2023YFB2703604); 湖北省重点研发计划 (2023BAB077, 2023BAB170); 国家自然科学基金 (62472327); 中央高校基本科研业务费专项资金 (2042023kf0219); CCF-蚂蚁科研基金 (CCF-AFSG RF20230106)


Secure Multi-party Database Computing System Based on Serverless Computing
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [55]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    联合多方数据库的安全计算可以在保护数据隐私的情况下, 对多个数据库的私有数据进行联合查询或联合建模. 这样的联合体通常是一个松散的组织, 各参与的数据库可以随时离线, 然而现有多方安全计算系统通常采用秘密共享等隐私计算方案, 需要参与者保持在线状态, 导致系统的可用性较差. 此外, 现有系统对外提供服务时无法预知用户的数量以及请求速度, 如果将系统部署在私有集群或者租用云计算平台的虚拟机, 面对爆发式请求时系统延迟增大, 在请求较少时又造成资源浪费, 表现出较差的可扩展性. 随着云计算技术的发展, 无服务器计算(serverless computing)作为一种新的云原生部署范式出现, 具有良好的弹性资源伸缩能力. 在该工作中, 提出了基于无服务器计算环境的系统架构和间接通信方案, 实现了一套高可扩展、高可用的多方数据库安全计算系统, 可以容忍数据库节点掉线, 并且在用户请求流量发生变化时自动伸缩系统资源. 基于阿里云和OceanBase数据库实现了系统原型并进行了充分的实验对比, 结果显示该系统在低频查询、横向建模等任务上, 在计算成本、系统性能和可扩展性方面优于现有系统, 最高能够节省78%的计算成本、提升系统性能1.6倍, 同时也分析了本系统对于复杂查询、纵向建模等任务存在的不足.

    Abstract:

    Secure computation of federated multi-party databases can perform federated querying or federated modeling on private data from multiple databases while preserving data privacy. Such a federation is typically a loosely organized group where the participating databases may dropout unexpectedly. However, existing multi-party secure computation systems usually employ privacy-preserving computation schemes like secret sharing, which require participants to remain online, resulting in poor system availability. Moreover, these systems are unable to predict the number of users or request rates when providing services externally. If the system is deployed on a private cluster or rented virtual machines from a cloud computing platform, it will experience increased latency during sudden bursts of requests and resource waste when the request workload is low, leading to poor overall scalability of the system. With the advancement of cloud computing technology, serverless computing has emerged as a new cloud-native deployment paradigm that offers excellent elastic resource scaling. This study designs a system architecture and an indirect communication scheme within the serverless computing framework to architect a highly scalable and highly available multi-party database secure computation system. This system can tolerate database node disconnections and automatically scale system resources in response to user request traffic changes. A system prototype based on Alibaba Cloud and OceanBase database is implemented. Comprehensive experimental comparisons are conducted. The results show that the proposed system outperforms existing systems in terms of computational cost, system performance, and scalability for tasks such as low-frequency queries and horizontal modeling. It can save up to 78% in computational costs and improve system performance by over 1.6 times. The shortcomings of the proposed system for tasks such as complex queries and vertical modeling are analyzed.

    参考文献
    [1] Lindell Y. Secure multiparty computation for privacy preserving data mining. In: Encyclopedia of Data Warehousing and Mining. Hershey: IGI Global, 2005. 1005–1009. [doi: 10.4018/978-1-59140-557-3.ch189]
    [2] 崔斌, 高军, 童咏昕, 许建秋, 张东祥, 邹磊. 新型数据管理系统研究进展与趋势. 软件学报, 2019, 30(1): 164–193. http://www.jos.org.cn/1000-9825/5646.htm
    Cui B, Gao J, Tong YX, Xu JQ, Zhang DX, Zou L. Progress and trend in novel data management system. Ruan Jian Xue Bao/Journal of Software, 2019, 30(1): 164–193 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5646.htm
    [3] Yang Q, Liu Y, Chen TJ, Tong YX. Federated machine learning: Concept and applications. ACM Trans. on Intelligent Systems and Technology (TIST), 2019, 10(2): 12.
    [4] Sheth AP, Larson JA. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 1990, 22(3): 183–236.
    [5] 汤凌韬, 陈左宁, 张鲁飞, 吴东. 联邦学习中的隐私问题研究进展. 软件学报, 2023, 34(1): 197–229. http://www.jos.org.cn/1000-9825/6411.htm
    Tang LT, Chen ZN, Zhang LF, Wu D. Research progress of privacy issues in federated learning. Ruan Jian Xue Bao/Journal of Software, 2023, 34(1): 197–229 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6411.htm
    [6] Bogdanov D, Laur S, Willemson J. Sharemind: A framework for fast privacy-preserving computations. In: Proc. of the 13th European Symp. on Research in Computer Security on Computer Security (ESORICS 2008). Málaga: Springer, 2008. 192–206. [doi: 10.1007/978-3-540-88313-5_13]
    [7] Bater J, Elliott G, Eggen C, Goel S, Kho A, Rogers J. SMCQL: Secure querying for federated databases. Proc. of the VLDB Endowment, 2017, 10(6): 673–684.
    [8] Volgushev N, Schwarzkopf M, Getchell B, Varia M, Lapets A, Bestavros A. Conclave: Secure multi-party computation on big data. In: Proc. of the 14th EuroSys Conf. Dresden: ACM, 2019. 3. [doi: 10.1145/3302424.3303982]
    [9] 李书缘, 季与点, 史鼎元, 廖旺冬, 张利鹏, 童咏昕, 许可. 面向多方安全的数据联邦系统. 软件学报, 2022, 33(3): 1111–1127. http://www.jos.org.cn/1000-9825/6458.htm
    Li SY, Ji YD, Shi DY, Liao WD, Zhang LP, Tong YX, Xu K. Data federation system for multi-party security. Ruan Jian Xue Bao/Journal of Software, 2022, 33(3): 1111–1127 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6458.htm
    [10] Tong YX, Pan XC, Zeng YX, Shi YX, Xue CB, Zhou ZM, Zhang XF, Chen L, Xu Y, Xu K, Lv WF. Hu-Fu: Efficient and secure spatial queries over data federation. Proc. of the VLDB Endowment, 2022, 15(6): 1159–1172.
    [11] 张媛媛, 李书缘, 史烨轩, 周南, 徐毅, 许可. 面向数据联邦的安全多方θ-连接算法. 软件学报, 2023, 34(3): 1109–1125. http://www.jos.org.cn/1000-9825/6795.htm
    Zhang YY, Li SY, Shi YX, Zhou N, Xu Y, Xu K. Secure multi-party θ-join algorithm toward data federation. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1109–1125 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6795.htm
    [12] Wang FW, Zhu H, Lu RX, Zheng YD, Li H. Achieve efficient and privacy-preserving disease risk assessment over multi-outsourced vertical datasets. IEEE Trans. on Dependable and Secure Computing, 2022, 19(3): 1492–1504
    [13] Zhu L, Yang JP, Song X, Wang Y, Wei YG. Real-time entity resolution by forest-based indexing in database systems with vertical fragmentations. In: Proc. of the 5th Int’l Conf. on Computer Science and Application Engineering. Sanya: ACM, 2021. 67.
    [14] McMahan B, Moore E, Ramage D, Hampson S, Arcas BAY. Communication-efficient learning of deep networks from decentralized data. In: Proc. of the 20th Int’l Conf. on Artificial Intelligence and Statistics. 2017. 1273–1282.
    [15] Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V. Federated optimization in heterogeneous networks. arXiv:1812.06127, 2020.
    [16] Li XX, Jiang MR, Zhang XF, Kamp M, Dou Q. FedBN: Federated learning on non-IID features via local batch normalization. arXiv:2102.07623, 2021.
    [17] Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, Thorne B. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv:1711.10677, 2017.
    [18] Wu YC, Cai SF, Xiao XK, Chen G, Ooi BC. Privacy preserving vertical federated learning for tree-based models. arXiv:2008.06170, 2020.
    [19] Vepakomma P, Gupta O, Swedish T, Raskar R. Split learning for health: Distributed deep learning without sharing raw patient data. arXiv:1812.00564, 2018.
    [20] Chen TY, Jin X, Sun YJ, Yin W. VAFL: A method of vertical asynchronous federated learning. arXiv:2007.06081, 2020.
    [21] Romanini D, Hall AJ, Papadopoulos P, Titcombe T, Ismail A, Cebere T, Sandmann R, Roehm R, Hoeh MA. PyVertical: A vertical federated learning framework for multi-headed SplitNN. arXiv:2104.00489, 2021.
    [22] He CY, Li SZ, So J, Zeng X, Zhang M, Wang HY, Wang XY, Vepakomma P, Singh A, Qiu H, Zhu XH, Wang JZ, Shen L, Zhao PL, Kang Y, Liu Y, Raskar R, Yang Q, Annavaram M, Avestimehr S. FedML: A research library and benchmark for federated machine learning. arXiv:2007.13518, 2020.
    [23] 史鼎元, 王晏晟, 郑鹏飞, 童咏昕. 面向企业数据孤岛的联邦排序学习. 软件学报, 2021, 32(3): 669–688. http://www.jos.org.cn/1000-9825/6174.htm
    Shi DY, Wang YS, Zheng PF, Tong YX. Cross-silo federated learning-to-rank. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3): 669–688 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6174.htm
    [24] Liu Y, Fan T, Chen TJ, Xu Q, Yang Q. FATE: An industrial grade platform for collaborative learning with data protection. The Journal of Machine Learning Research, 2021, 22(1): 10320–10325.
    [25] Chen DS, Tan VJ, Lu ZL, Wu EH, Hu J. OpenFed: A comprehensive and versatile open-source federated learning framework. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops. Vancouver: IEEE, 2023. 5018–5026.
    [26] Wen JF, Chen ZP, Jin X, Liu XZ. Rise of the planet of serverless computing: A systematic review. ACM Trans. on Software Engineering and Methodology, 2023, 32(5): 131.
    [27] AWS Lambda. 2024. https://docs.aws.amazon.com/lambda/latest/dg/welcome.html
    [28] Amazon S3. 2024. https://aws.amazon.com/s3/
    [29] Azure Functions. 2024. https://docs.microsoft.com/en-us/azure/azure-functions
    [30] Google Cloud Functions. 2024. https://cloud.google.com/functions
    [31] Alibaba Cloud Function Compute. 2024. https://www.aliyun.com/product/fc
    [32] OpenFaaS. 2024. https://www.openfaas.com
    [33] OpenLambda. 2024. https://github.com/open-lambda/open-lambda
    [34] OpenWhisk. 2024. https://openwhisk.apache.org
    [35] 董昊文, 张超, 李国良, 冯建华. 云原生数据库综述. 软件学报, 2024, 35(2): 899–926. http://www.jos.org.cn/1000-9825/6952.htm
    Dong HW, Zhang C, Li GL, Feng JH. Survey on cloud-native databases. Ruan Jian Xue Bao/Journal of Software, 2024, 35(2): 899–926 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6952.htm
    [36] Perron M, Fernandez RC, DeWitt D, Madden S. Starling: A scalable query engine on cloud functions. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 131–141. [doi: 10.1145/3318464.3380609]
    [37] Wang H, Niu D, Li BC. Distributed machine learning with a serverless architecture. In: Proc. of the 2019 IEEE Conf. on Computer Communications. Paris: IEEE, 2019. 1288–1296. [doi: 10.1109/INFOCOM.2019.8737391]
    [38] Carreira J, Fonseca P, Tumanov A, Zhang A, Katz R. Cirrus: A serverless framework for end-to-end ML workflows. In: Proc. of the 2019 ACM Symp. on Cloud Computing. Santa Cruz: ACM, 2019. 13–24. [doi: 10.1145/3357223.3362711]
    [39] Grafberger A, Chadha M, Jindal A, Gu JF, Gerndt M. FedLess: Secure and scalable federated learning using serverless computing. In: Proc. of the 2021 IEEE Int’l Conf. on Big Data. Orlando: IEEE, 2021. 164–173. [doi: 10.1109/BigData52589.2021.9672067]
    [40] Jiang JW, Gan SD, Liu Y, Wang FL, Alonso G, Klimovic A, Singla A, Wu WT, Zhang C. Towards demystifying serverless machine learning training. In: Proc. of the 2021 Int’l Conf. on Management of Data. Virtual Event: ACM, 2021. 857–871.
    [41] Yang ZK, Yang CH, Han FS, Zhuang MQ, Yang B, Yang ZF, Cheng XJ, Zhao YZ, Shi WH, Xi HF, Yu H, Liu B, Pan Y, Yin BX, Chen JQ, Xu QQ. OceanBase: A 707 million tpmC distributed relational database system. Proc. of the VLDB Endowment, 2022, 15(12): 3385–3397.
    [42] Yang ZF, Xu QQ, Gao SY, Yang CH, Wang GP, Zhao YZ, Kong FY, Liu H, Wang WH, Xiao JL. OceanBase Paetica: A hybrid shared-nothing/shared-everything database for supporting single machine and distributed cluster. Proc. of the VLDB Endowment, 2023, 16(12): 3728–3740.
    [43] OpenStreetMap. 2024. https://www.openstreetmap.org
    [44] imis-3months. 2024. http://chorochronos.datastories.org/?q=content/imis-3months
    [45] NYC-taxi-data. 2024. https://github.com/toddwschneider/nyc-taxi-data
    [46] TPC-H Version 3. 2022. http://www.tpc.org/tpch/
    [47] Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Wine Quality. UCI Machine Learning Repository, 2009.
    [48] Mangasarian OL, Street WN, Wolberg WH. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 1995, 43(4): 570–577.
    [49] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11): 2278–2324.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

马旭阳,周小凯,郑浩宇,崔斌,徐泉清,杨传辉,晏潇,江佳伟.基于无服务器计算的多方数据库安全计算系统.软件学报,2025,36(3):1084-1106

复制
分享
文章指标
  • 点击次数:989
  • 下载次数: 1532
  • HTML阅读次数: 143
  • 引用次数: 0
历史
  • 收稿日期:2024-05-27
  • 最后修改日期:2024-07-16
  • 在线发布日期: 2024-09-13
文章二维码
您是第19892403位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号