面向多方安全的数据联邦系统
作者:
作者简介:

李书缘(1998-),女,博士生,CCF学生会员,主要研究领域为大数据分析处理,隐私保护;
张利鹏(1997-),男,硕士生,CCF学生会员,主要研究领域为联邦数据库,隐私保护;
季与点(1991-),男,博士,工程师,主要研究领域为时空数据分析处理,数据压缩,数据挖掘;
童咏昕(1982-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为联邦学习,时空大数据分析处理,智慧城市,众包计算,群体智能,隐私保护;
史鼎元(1998-),男,硕士生,主要研究领域为联邦学习,时空大数据分析处理,众包计算,群体智能,隐私保护;
许可(1971-),男,博士,教授,博士生导师,主要研究领域为算法与人工智能.廖旺冬(1996-),男,硕士生,主要研究领域为大数据分析处理,隐私保护.

通讯作者:

童咏昕,E-mail:yxtong@buaa.edu.cn

基金项目:

国家重点研发计划(2018AAA0101100);国家自然科学基金(61822201,U1811463,62076017,61690202);北京市科技计划(Z191100002519012);CCF-华为数据库创新研究计划(CCF-HuaweiDBIR2020008B);软件开发环境国家重点实验室(北京航空航天大学)开放课题(SKLSDE-2020ZX-07)


Data Federation System for Multi-party Security
Author:
  • LI Shu-Yuan

    LI Shu-Yuan

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • JI Yu-Dian

    JI Yu-Dian

    Information Center, Ministry of Science and Technology, Beijing 100862, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • SHI Ding-Yuan

    SHI Ding-Yuan

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • LIAO Wang-Dong

    LIAO Wang-Dong

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • ZHANG Li-Peng

    ZHANG Li-Peng

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • TONG Yong-Xin

    TONG Yong-Xin

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • XU Ke

    XU Ke

    State Key Laboratory of Software Development Enviroment (Beihang University), Beijing 100191, China;Beijing Advanced Innovation Center for Big Data and Brain Computing (Beihang University), Beijing 100191, China;School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    在期刊界中查找
    在百度中查找
    在本站中查找
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [35]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    大数据时代,数据作为生产要素具有重要价值.因此,通过数据共享实现大规模数据的分析挖掘与利用具有重要意义.然而,近年来日益严格的隐私安全保护要求使得数据分散异质的多方之间不能任意共享数据,加剧了“数据孤岛”问题.数据联邦能让多数据拥有方在保护隐私的前提下完成联合查询.因此,基于“数据不动计算动”的联邦计算思想实现了一种多方安全的关系型数据联邦系统.该系统适配多种关系型数据库,能够为用户屏蔽底层多数据拥有方的数据异构性.系统基于秘密共享实现了支持多方安全的基础操作多方安全算子库,优化了算子的结果重建过程,提高了其执行效率.在此基础上,系统支持求和、求均值、求最值、等值连接和任意连接等查询操作,并充分利用多方特点减少各数据拥有方之间的数据交互,降低安全开销,从而有效支持高效数据共享.最后,在标准测试数据集TPC-H上进行实验,实验结果说明:与目前的数据联邦系统SMCQL和Conclave相比,该系统能够支持更多的数据拥有方参与,并且在多种查询操作上有更高的执行效率,最快可超越现有系统3.75倍.

    Abstract:

    In the era of big data, data is of great value as an essential factor of production. It is of great significance to implement its analysis, mining and utilization of large-scale data via data sharing. However, due to the heterogeneous dispersion of data and increasingly rigorous privacy protection regulations, data owners can not arbitrarily share data. This dilemma turns data owners into data silos. Data Federation calculate collaborative query while preserving the privacy of data silos. This study implements a multi-party secure relational data federation system. The system is designed based on the idea of federated computation that “data stays, computation moves”. Its adaptation interface of the system is different kinds of relational database adaptation, which can shield the data heterogeneity of multiple data owners. The system implements the multi-party security basic calculator library based on secret sharing, and the calculator realizes the optimization of the result reconstruction process. On this basis, it supports the query operations such as sum, average, maximum, equi-join and theta-join. Making full use of the multi-party properties to reduce the data interaction among data owners, the proposed system reduces the security computation overhead, so as to effectively support efficient data sharing. Finally, the experiment is carried out on the benchmark data set TPC-H. The experimental results show that the proposed system can support more data owners’ participation and has higher execution efficiency than current data federation systems such as SMCQL and Conclave by at most 3.75 times.

    参考文献
    [1] Doan AH, Halevy A, Ives Z. Principles of Data Integration. Elsevier, 2012.
    [2] Shi DY, Wang YS, Zheng PF, Tong YX. Cross-Silo federated learning-to-rank. Ruan Jian Xue Bao/Journal of Software, 2021, 32(3):669-688(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6174.htm[doi:10.13328/j.cnki.jos.006174]
    [3] Liu C, Wang XS, Nayak K, et al. Oblivm:A programming framework for secure computation. In:Proc. of the 2015 IEEE Symp. on Security and Privacy. IEEE, 2015. 359-376.
    [4] Zahur S, Evans D. Obliv-C:A language for extensible data-oblivious computation. IACR Cryptology ePrint Archive, 2015, 2015: No.1153.
    [5] Bater J, Elliott G, Eggen C, et al. SMCQL:Secure query processing for private data networks. Proc. of the 2017 VLDB Endowment, 2017, 10(6):673-684.
    [6] Volgushev N, Schwarzkopf M, Getchell B, et al. Conclave:Secure multi-party computation on big data. In:Proc. of the 14th EuroSys Conf. ACM, 2019. No.3.
    [7] Hastings M, Hemenway B, Noble D, et al. Sok:General purpose compilers for secure multi-party computation. In:Proc. of the 2019 IEEE Symp. on Security and Privacy. IEEE, 2019. 1220-1237.
    [8] Bogdanov D, Laur S, Willemson J. Sharemind:A framework for fast privacy-preserving computations. In:Proc. of the 2008 European Symp. on Research in Computer Security. Berlin, Heidelberg:Springer, 2008. 192-206.
    [9] Keller M. MP-SPDZ:A versatile framework for multi-party computation. In:Proc. of the 2020 ACM SIGSAC Conf. on Computer and Communications Security. ACM, 2020. 1575-1590.
    [10] Shamir A. How to share a secret. Communications of the ACM, 1979, 22(11):612-613.
    [11] Yao AC. Protocols for secure computations. In:Proc. of the 23rd Annual Symp. on Foundations of Computer Science. IEEE, 1982. 160-164.
    [12] 1988. http://www.tpc.org/tpch/
    [13] Wang Y, Yi K. Secure Yannakakis:Join-aggregate queries over private data. In:Proc. of the 2021 Int'l Conf. on Management of Data. ACM, 2021. 1969-1981.
    [14] Sheth AP, Larson JA. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 1990, 22(3):183-236.
    [15] Josifovski V, Schwarz P, Haas L, et al. Garlic:A new flavor of federated query processing for DB2. In:Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. ACM, 2002. 524-532.
    [16] Bellare M, Hoang VT, Rogaway P. Foundations of garbled circuits. In:Proc. of the 2012 ACM Conf. on Computer and Communications Security. ACM, 2012. 784-796.
    [17] Beimel A. Secret-sharing schemes:A survey. In:Proc. of the2011 Int'l Conf. on Coding and Cryptology. Berlin, Heidelberg: Springer, 2011. 11-46.
    [18] Setty S, Vu V, Panpalia N, et al. Taking proof-based verified computation a few steps closer to practicality. In:Proc. of the 21st USENIX Security Symp. ACM, 2012. 253-268.
    [19] Applebaum B. Key-dependent message security:Generic amplification and completeness. In:Proc. of the 2011 Annual Int'l Conf. on the Theory and Applications of Cryptographic Techniques. Berlin, Heidelberg:Springer, 2011. 527-546.
    [20] Chen F, Cheng S, Mohammed N, et al. Precise:Privacy-preserving cloud-assisted quality improvement service in healthcare. In: Proc. of the 8th Int'l Conf. on Systems Biology. IEEE, 2014. 176-183.
    [21] Kolesnikov V, Sadeghi AR, Schneider T. Improved garbled circuit building blocks and applications to auctions and computing minima. In:Proc. of the 2009 Int'l Conf. on Cryptology and Network Security. Berlin, Heidelberg:Springer, 2009. 1-20.
    [22] Kim HJ, Kim HI, Chang JW. A privacy-preserving kNN classification algorithm using Yao's garbled circuit on cloud computing. In:Proc. of the 2017 IEEE 10th Int'l Conf. on Cloud Computing. IEEE, 2017. 766-769.
    [23] Yao ACC. How to generate and exchange secrets. In:Proc. of the 27th Annual Symp. on Foundations of Computer Science. IEEE, 1986. 162-167.
    [24] Kilian J. Founding crytpography on oblivious transfer. In:Proc. of the 20th Annual ACM Symp. on Theory of Computing. ACM, 1988. 20-31.
    [25] Huang W, Langberg M, Kliewer J, et al. Communication efficient secret sharing. IEEE Trans. on Information Theory, 2016, 62(12): 7195-7206.
    [26] D'Souza R, Jao D, Mironov I, et al. Publicly verifiable secret sharing for cloud-based key management. In:Proc. of the 2011 Int'l Conf. on Cryptology in India. Berlin, Heidelberg:Springer, 2011. 290-309.
    [27] Naor M, Wool A. Access control and signatures via quorum secret sharing. IEEE Trans. on Parallel and Distributed Systems, 1998, 9(9):909-922.
    [28] Schoenmakers B. A simple publicly verifiable secret sharing scheme and its application to electronic voting. In:Proc. of the '99 Annual Int'l Cryptology Conf. Berlin, Heidelberg:Springer, 1999. 148-164.
    [29] Zhu Y, Yang YT, Sun ZW, Feng DG. Ownership proofs of digital works based on secure multiparty computation. Ruan Jian Xue Bao/Journal of Software, 2006, 17(1):157-166(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/17/157.htm [doi:10.1360/jos170157]
    [30] Tan ZW, Zhang LF. Survey on privacy preserving techniques for machine learning. Ruan Jian Xue Bao/Journal of Software, 2020, 31. 7):2127-2156(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6052.htm[doi:10.13328/j.cnki.jos.006052]
    [31] Blakley GR. Safeguarding cryptographic keys. In:Proc. of the Int'l Workshop on Managing Requirements Knowledge. IEEE Computer Society, 1979. 313-313.
    附中文参考文献:
    [2] 史鼎元, 王晏晟, 郑鹏飞, 童咏昕. 面向企业数据孤岛的联邦排序学习. 软件学报, 2021, 32(3):669-688. http://www.jos.org.cn/1000-9825/6174.htm[doi:10.13328/j.cnki.jos.006174]
    [29] 朱岩, 杨永田, 孙中伟, 冯登国. 基于安全多方计算的数字作品所有权证明. 软件学报, 2006, 17(1):157-166. http://www.jos.org.cn/1000-9825/17/157.htm[doi:10.1360/jos170157]
    [30] 谭作文, 张连福. 机器学习隐私保护研究综述. 软件学报, 2020, 31. 7):2127-2156. http://www.jos.org.cn/1000-9825/6052.htm [doi:10.13328/j.cnki.jos.006052]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李书缘,季与点,史鼎元,廖旺冬,张利鹏,童咏昕,许可.面向多方安全的数据联邦系统.软件学报,2022,33(3):1111-1127

复制
分享
文章指标
  • 点击次数:2663
  • 下载次数: 6787
  • HTML阅读次数: 3930
  • 引用次数: 0
历史
  • 收稿日期:2021-06-30
  • 最后修改日期:2021-07-31
  • 在线发布日期: 2021-10-21
  • 出版日期: 2022-03-06
文章二维码
您是第19744434位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号