一种基于格的隐私保护聚类数据挖掘方法
作者:
基金项目:

国家自然科学基金(61232002,61572378,61202034);CCF中文信息技术开放课题(CCF2014-01-02);武汉市创新团队项目(2014070504020237);武汉大学自主科研项目(2042016gf0020,2016-2017)


Privacy Preserving Cluster Mining Method Based on Lattice
Author:
Fund Project:

National Natural Science Foundation of China (61232002, 61572378, 61202034); CCF Chinese information technology open topic (CCF2014-01-02); Wuhan Innovation Team Project (2014070504020237); Wuhan University independent research project(2042016gf0020, 2016-2017)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [31]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    由于云计算的诸多优势,用户倾向于将数据挖掘和数据分析等业务外包到专业的云服务提供商,然而随之而来的是用户的隐私不能得到保证.目前,众多学者关注云环境下敏感数据存储的隐私保护问题,而隐私保护数据分析的相关研究还比较少.但是如果仅仅为了保护数据隐私,而不对大数据进行挖掘分析,大数据也就失去了其潜在的巨大价值.提出了一种云计算环境下基于格的隐私保护数据挖掘方法,利用格加密构建隐私数据的安全同态运算方法,并且在此基础上实现了支持隐私保护的云端密文数据聚类分析数据挖掘服务.为保护用户数据隐私,用户将数据加密之后发布给云服务提供商,云服务提供商利用基于格的同态加密算法实现隐私保护的k-means、隐私保护层次聚类以及隐私保护DBSCAN数据挖掘服务,但云服务提供商并不能直接访问用户数据破坏用户隐私.与现有的隐私数据发布方法相比,隐私数据发布基于格的最接近向量困难问题(CVP)和最短向量困难问题(SVP)具有很高的安全性.同时,有效保持了密文数据间距离的精确性.与现有研究相比,挖掘结果也具有更高的精确性和可用性.对方法的安全性进行了理论分析,并设计实验对提出的隐私保护数据挖掘方法效率进行评估,实验结果表明,提出的基于格的隐私保护数据挖掘算法与现有的方法相比具有更高的数据分析精确性和计算效率.

    Abstract:

    Due to the various advantages of cloud computing, users tend to outsource data mining task to professional cloud service providers. However, user's privacy cannot be guaranteed. Currently, while many scholars are concerned about how to protect sensitive data from unauthorized access, few scholars engage research on data analysis. But if potential knowledge cannot be mined, the value of big data may not be fully utilized. This paper proposes a privacy preserving data mining (PPDM) method based on lattice, which support ciphertext intermediate point and distance homomorphic computing. Meanwhile, it builds a privacy preserving cloud ciphertext data clustering data mining Method. Users encrypt privacy data before outsource the data to cloud service providers, cloud service providers use homomorphic encryption to achieve privacy protection mining algorithms including k-means, hierarchical clustering and DBSCAN. Compared with the existing PPDM method, the presented method with high security is based on shortest vector difficulties (SVP) and the closest vector problem (CVP). Meanwhile, it maintains the accuracy of distance between two data, providing mining results with high accuracy and availability. Experiments are designed for the privacy preserving cluster mining (PPCM) with cardiac arrhythmia datasets of machine learning, and the experimental results show that the method based on lattice ensure not only security but also accuracy and performance.

    参考文献
    [1] Ni WW, Chen G, Chong ZH, Wu YJ. Privacy-Preserving data publication for clustering. Journal of Computer Research and Development, 2012,49(5):1095-1104. (in Chinese with English abstract)
    [2] Ding Y, Wang HM, Shi PC, Wu QB, Dai HD, Fu HY. Trusted Cloud Service. Chinese Journal of Computers, 2015,38(1):133-149(in Chinese with English abstract).[doi:10.3724/SP.J.1016.2015.00133]
    [3] Wu XD, Zhu XQ, Wu GQ, Ding W. Data mining with big data. IEEE Trans. on Knowledge and Data Engineering, 2014,26(1):97-107.[doi:10.1109/TKDE.2013.109]
    [4] Agrawal R, Srikant R. Privacy-Preserving data mining. ACM Sigmod Record, 2000,29(2):439-450.[doi:10.1145/342009.335438]
    [5] Zhou SG, Li F, Tao YF, Xiao XK. Privacy preservation in data applications:A survey. Chinese Journal of Computers, 2009,32(5):847-861(in Chinese with English abstract).[doi:10.3724/SP.J.1016.2009.00847]
    [6] Mohammed N, Chen R, Fung BCM, Yu PS. Differentially private data release for data mining. In:Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2011. 493-501.[doi:10.1145/2020408.2020487]
    [7] Allard T, Hébrail G, Masseglia F, Pacitti E. Chiaroscuro:Transparency and privacy for massive personal time-series clustering. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2015. 779-794.[doi:10.1145/2723372. 2749453]
    [8] Vlachos M, Schneider J, Vassiliadis VG. On data publishing with clustering preservation. ACM Trans. on Knowledge Discovery from Data (TKDD), 2015,9(3):23.[doi:10.1145/2700403]
    [9] Giannotti F, Lakshmanan LVS, Monreale A, Pedreschi D, Wang H. Privacy-Preserving mining of association rules from outsourced transaction databases. Systems Journal, IEEE, 2013,7(3):385-395.[doi:10.1109/JSYST.2012.2221854]
    [10] Fong PK, Weber-Jahnke JH. Privacy preserving decision tree learning using unrealized data sets. IEEE Trans. on Knowledge and Data Engineering, 2012,24(2):353-364.[doi:10.1109/TKDE.2010.226]
    [11] Johnson A, Shmatikov V. Privacy-Preserving data exploration in genome-wide association studies. In:Proc. of the 19th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2013. 1079-1087.[doi:10.1145/2487575.2487687]
    [12] Zhang XY, Liu C, Nepal S, Pandey S, Chen JJ. A privacy leakage upper bound constraint-based approach for cost-effective privacy preserving of intermediate data sets in cloud. IEEE Trans. on Parallel and Distributed Systems, 2013,24(6):1192-1202.[doi:10.1109/TPDS.2012.238]
    [13] Nguyen P. Cryptanalysis of the goldreich-goldwasser-halevi cryptosystem from crypto'97. In:Proc. of the Annual Int'l Cryptology Conf. Berlin, Heidelberg:Springer-Verlag, 1999. 288-304.[doi:10.1007/3-540-48405-1_18]
    [14] Yi X, Zhang Y. Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Information Systems, 2013,38(1):97-107.[doi:10.1016/j.is.2012.06.001]
    [15] Oliveira SRM, Zaïane OR. Achieving privacy preservation when sharing data for clustering. In:Proc. of the Workshop on Secure Data Management in Conjunction with VLDB. Berlin, Heidelberg:Springer-Verlag, 2004. 67-82.[doi:10.1007/978-3-540-30073-1_6]
    [16] Islam MZ, Brankovic L. Privacy preserving data mining:A noise addition framework using a novel clustering technique. Knowledge-Based Systems, 2011,24(8):1214-1223.[doi:10.1016/j.knosys.2011.05.011]
    [17] Zhu Y, Liu L. Optimal randomization for privacy preserving data mining. In:Proc. of the 10th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. ACM Press, 2004. 761-766.[doi:10.1145/1014052.1014153]
    [18] Chong ZH, Ni WW, Liu TT, Zhang Y. A privacy-preserving data publishing algorithm for clustering application. Journal of Computer Research and Development, 2010,47(12):2083-2089(in Chinese with English abstract).
    [19] Li YP, Chen MH, Li QW, Zhang W. Enabling multilevel trust in privacy preserving data mining. IEEE Trans. on Knowledge and Data Engineering, 2012,24(9):1598-1612.[doi:10.1109/TKDE.2011.124]
    [20] Navarro-Arribas G, Torra V, Erola A, Castella-Roca J. User k-anonymity for privacy preserving data mining of query logs. Information Processing & Management, 2012,48(3):476-487.[doi:10.1016/j.ipm.2011.01.004]
    [21] Xiao X, Yi K, Tao Y. The hardness and approximation algorithms for l-diversity. In:Proc. of the 13th Int'l Conf. on Extending Database Technology. ACM Press, 2010. 135-146.[doi:10.1145/1739041.1739060]
    [22] Dwork C. A firm foundation for private data analysis. Communications of the ACM, 2011,54(1):86-95.[doi:10.1145/1866739. 1866758]
    [23] Mohan P, Thakurta A, Shi E, Song D, Culler DE. GUPT:Privacy preserving data analysis made easy. In:Proc. of the 2012 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2012. 349-360.[doi:10.1145/2213836.2213876]
    [24] Parameswaran R, Blough DM. Privacy preserving data obfuscation for inherently clustered data. Int'l Journal of Information and Computer Security, 2008,2(1):4-26.[doi:10.1504/IJICS.2008.016819]
    [25] Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In:Proc. of the 39th Annual ACM Symp. on Theory of Computing. ACM Press, 2007. 75-84.[doi:10.1145/1250790.1250803]
    [26] Blake CL, Merz CJ. UCI repository of machine learning databases. Irvine:University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html
    附中文参考文献:
    [1] 倪巍伟,陈耿,崇志宏,吴英杰.面向聚类的数据隐藏发布研究.计算机研究与发展,2012,49(5):1095-1104.
    [2] 丁滟,王怀民,史佩昌,吴庆波,戴华东,富弘毅.可信云服务.计算机学报,2015,38(1):133-149.[doi:10.3724/SP.J.1016.2015.00133]
    [5] 周水庚,李丰,陶宇飞,肖小奎.面向数据库应用的隐私保护研究综述.计算机学报,2009,32(5):847-861.[doi:10.3724/SP.J.1016. 2009.00847]
    [18] 崇志宏,倪巍伟,刘腾腾,张勇.一种面向聚类的隐私保护数据发布方法.计算机研究与发展,2010,47(12):2083-2089.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

崔一辉,宋伟,王占兵,史成良,程芳权.一种基于格的隐私保护聚类数据挖掘方法.软件学报,2017,28(9):2293-2308

复制
分享
文章指标
  • 点击次数:4558
  • 下载次数: 6838
  • HTML阅读次数: 3825
  • 引用次数: 0
历史
  • 收稿日期:2016-07-10
  • 最后修改日期:2016-11-10
  • 在线发布日期: 2017-09-02
文章二维码
您是第20250560位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号