Dynamic Data Protection System for Open Big Data Environment
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [33]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Big data has become a national basic strategic resource, and the opening and sharing of data is the core of China's big data strategy. Cloud native technology and lake-house architecture are reconstructing the big data infrastructure and promoting data sharing and value dissemination. The development of big data industry and technology require stronger data security and data sharing capabilities. However, data security in an open environment has become a bottleneck, which restricts the development and utilization of big data technology. The issues of data security and privacy protection have become increasingly prominent both in the open source big data ecosystem and the commercial big data system. Dynamic data protection system under the open big data environment is now facing challenges of data availability, processing efficiency and system scalability and etc. This study proposes a dynamic data protection system BDMasker for the open big data environment. Through a precise query analysis and query rewriting technology based on the query dependency model, it can accurately perceive but not change the original business request, which indicates that the whole process of dynamic desensitization has zero impact on the business. Furthermore, its multi-engine-oriented unified security strategy framework realizes the vertical expansion of dynamic data protection capabilities and the horizontal expansion among multiple computing engines. The distributed computing capability of the big data execution engine can be used to improve the data protection processing performance of the system. The experimental results show that the precise SQL analysis and rewriting technology proposed by BDMasker is effective, the system has good scalability and performance, and the overall performance fluctuates within 3% in the TPC-DS and YCSB benchmark tests.

    Reference
    [1] Qian WJ, Shen QN, Wu PF, Dong CT, Wu ZH.Research progress on privacy-preserving techniques in big data computing environment.Chinese Journal of Computers, 2022, 45(4):669-701(in Chinese with English abstract).
    [2] Fang BX, Jia Y, Li AP, Jiang R.Privacy preservation in big data:A survey.Big Data Research, 2016, 2(1):1-18(in Chinese with English abstract).[doi:10.11959/j.issn.2096-0271.2016001]
    [3] Wu XD, Dong BB, Du XZ, Yang W.Data governance technology.Ruan Jian Xue Bao/Journal of Software, 2019, 30(9):2830-2856(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/5854.htm[doi:10.13328/j.cnki.jos.005854]
    [4] Wang Z, Liu GW, Wang Y, Li Y.Research on the development and trend of data masking technology.Information and Communications Technology and Policy, 2020, 46(4):18-22(in Chinese with English abstract).
    [5] Chen XY, Gao YZ, Tang HL, Du XH.Research progress on big data security technology.SCIENTIA SINICA Informationis, 2020, 50(1):25-66(in Chinese with English abstract).[doi:10.1360/N112019-00077]
    [6] Tong LL, Li PX, Duan DS, Ren BY, Li YX.Data masking model for heterogeneous big data environment.Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2):249-257(in Chinese with English abstract).
    [7] Li SY, Ji YD, Shi DY, Liao WD, Zhang LP, Tong YX, Xu K.Data federation system for multi-party security.Ruan Jian Xue Bao/Journal of Software, 2022, 33(3):1111-1127(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/6458.htm[doi:10.13328/j.cnki.jos.006458]
    [8] Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I.Discretized streams:Fault-tolerant streaming computation at scale.In:Proc.of the 24th ACM Symp.on Operating Systems Principles.New York:ACM, 2013.423-438.
    [9] Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K.Apache flink:Stream and batch processing in a single engine.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4):28-38.
    [10] Liu BX.Design and implementation of performance test tool based on TPC-DS[MS.Thesis].Dalian:Dalian University of Technology, 2018(in Chinese with English abstract).
    [11] Manjunath TN, Hegadi RS, Mohan HS.Automated data validation for data migration security.Int'l Journal of Computer Applications, 2011, 30(6):41-46.
    [12] Gartner.Magic quadrant for data masking technology.2022.https://www.gartner.com/en/documents/3180344
    [13] Softwaretestinghelp.Best data masking tools and software.2022.https://www.softwaretestinghelp.com/data-masking-tools/
    [14] Moffie M, Mor D, Asaf S, Farkash A.Next generation data masking engine.In:Proc.of the Int'l Workshop on Data Privacy Management, Cryptocurrencies and Blockchain Technology.Cham:Springer, 2021.152-160.
    [15] Xu MT.Dynamic data masking of openGauss.2022.https://blog.opengauss.org/en/post/2022/dynamic-data-masking-of-opengauss/
    [16] The Apache Software Foundation.Apache Hive.2022.https://hive.apache.org/
    [17] Baranchikov AI, Gromov AY, Gurov VS, Grinchenko NN, Babaev SI.The technique of dynamic data masking in information systems.In:Proc.of the 5th Mediterranean Conf.on Embedded Computing (MECO).Piscataway:IEEE, 2016.473-476.[doi:10.1109/MECO.2016.7525695]
    [18] Archana RA, Hegadi RS, Manjunath TN.A study on big data privacy protection models using data masking methods.Int'l Journal of Electrical and Computer Engineering (IJECE), 2018, 8(5):3976-3983.
    [19] Larsonk KS, Boukari S.An improved data masking security solution using modulus based technique (MOBAT) for data warehouse system.Int'l Journal of Science and Engineering Applications, 2020, 9(6):68-78.
    [20] Cui BJ, Zhang BH, Wang KY.A data masking scheme for sensitive big data based on format-preserving encryption.In:Proc.of the IEEE Int'l Conf.on Computational Science and Engineering (CSE) and IEEE Int'l Conf.on Embedded and Ubiquitous Computing (EUC).Piscataway:IEEE, 2017.518-524.[doi:10.1109/CSE-EUC.2017.97]
    [21] Patil S, Polte M, Ren K, Tantisiriroj W, Xiao L, López J, Gibson G, Fuchs A, Rinaldi B.YCSB++:Benchmarking and performance debugging advanced features in scalable table stores.In:Proc.of the 2nd ACM Symp.on Cloud Computing.New York:ACM, 2011.1-14.[doi:https://doi.org/10.1145/2038916.2038925]
    [22] Yesin VI, Vilihura VV.Some approach to data masking as means to counteract the inference threat.Radiotekhnika, 2019, 3(198):113-130.[doi:https://doi.org/10.30837/rt.2019.3.198.09]
    [23] Shen J, Zhou TQ, Cao ZF.Protection methods for cloud data security.Journal of Computer Research and Development, 2021, 58(10):2079-2098(in Chinese with English abstract).
    附中文参考文献
    [1] 钱文君, 沈晴霓, 吴鹏飞, 董春涛, 吴中海.大数据计算环境下的隐私保护技术研究进展.计算机学报, 2022, 45(4):669-701.
    [2] 方滨兴, 贾焰, 李爱平, 江荣.大数据隐私保护技术综述.大数据, 2016, 2(1):1-18.[doi:10.11959/j.issn.2096-0271.2016001]
    [3] 吴信东, 董丙冰, 堵新政, 杨威.数据治理技术.软件学报, 2019, 30(9):2830-2856.http://www.jos.org.cn/1000-9825/5854.htm[doi:10.13328/j.cnki.jos.005854]
    [4] 王卓,刘国伟, 王岩, 李媛.数据脱敏技术发展现状及趋势研究.信息通信技术与政策, 2020, 46(4):18-22.
    [5] 陈性元, 高元照, 唐慧林, 杜学绘.大数据安全技术研究进展.中国科学:信息科学, 2020, 50(1):25-66.[doi:10.1360/N112019-00077]
    [6] 佟玲玲, 李鹏霄, 段东圣, 任博雅,李扬曦.面向异构大数据环境的数据脱敏模型.北京航空航天大学学报, 2022, 48(2):249-257.
    [7] 李书缘, 季与点, 史鼎元, 廖旺冬, 张利鹏, 童咏昕, 许可.面向多方安全的数据联邦系统.软件学报, 2022, 33(3):1111-1127.http://www.jos.org.cn/1000-9825/6458.htm[doi:10.13328/j.cnki.jos.006458]
    [10] 刘宝星.基于TPC-DS的性能测试工具设计与实现[硕士学位论文].大连:大连理工大学, 2018.
    [23] 沈剑, 周天祺, 曹珍富.云数据安全保护方法综述.计算机研究与发展, 2021, 58(10):2079-2098.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

屠要峰,牛家浩,王德政,高洪,徐进,洪科,阳方.面向开放大数据环境的动态数据保护系统.软件学报,2023,34(3):1213-1235

Copy
Share
Article Metrics
  • Abstract:1326
  • PDF: 3855
  • HTML: 2937
  • Cited by: 0
History
  • Received:May 14,2022
  • Revised:September 07,2022
  • Online: October 26,2022
  • Published: March 06,2023
You are the first2041634Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063