基于聚类的快速数据流匿名方法

doi:10.3724/SP.J.1001.2013.04330

微信服务号

微信订阅号

2025年5月11日 0:02 星期日

首页 > 过刊浏览>2013年第24卷第8期 >1852-1867. DOI:10.3724/SP.J.1001.2013.04330

PDF HTML阅读 XML下载导出引用引用提醒

基于聚类的快速数据流匿名方法
DOI:
                        10.3724/SP.J.1001.2013.04330
                    
CSTR:
                        
                    
作者:
                        郭昆郭昆
福州大学 数学与计算机科学学院, 福建 福州 350108
在期刊界中查找
在百度中查找
在本站中查找
张岐山张岐山
福州大学 管理学院, 福建 福州 350108
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(70871024); 福建省自然科学基金(2010J01358); 福州大学科技发展基金(201-xy-16)

Fast Clustering-Based Anonymization Algorithm for Data Streams

Author:

GUO Kun
GUO Kun
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Qi-Shan
ZHANG Qi-Shan
College of Management, Fuzhou University, Fuzhou 350108, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [22]

相似文献

引证文献

资源附件

文章评论

摘要:

为了防止敏感信息的泄漏,保护用户隐私,常采用概化和抑制等技术在共享数据前对其准标识符进行匿名化.与静态数据集不同,数据流具有潜在无限、高度动态等特性,使得数据流匿名需要解决更加复杂的问题,不能直接应用静态数据集的匿名方法.在分析现有数据流匿名方法的基础上,提出一种采用聚类思想进行数据流匿名的方法,通过单遍扫描数据识别和重用满足匿名条件的簇,以实现数据流的快速匿名.真实数据集上的实验结果表明,该方法在满足匿名要求的同时能够降低概化和抑制处理带来的信息损失,并且具有较低的时间和空间复杂度.

关键词:数据匿名;数据流;聚类

Abstract:

In order to prevent the disclosure of sensitive information and protect users’ privacy, the generalization and suppression of technology is often used to anonymize the quasi-identifiers of the data before its sharing. Data streams are inherently infinite and highly dynamic which are very different from static datasets, so that the anonymization of data streams needs to be capable of solving more complicated problems. The methods for anonymizing static datasets cannot be applied to data streams directly. In this paper, an anonymization approach for data streams is proposed with the analysis of the published anonymization methods for data streams. This approach scans the data only once to recognize and reuse the clusters that satisfy the anonymization requirements for speeding up the anonymization process. Experimental results on the real dataset show that the proposed method can reduce the information loss that is caused by generalization and suppression and also satisfies the anonymization requirements and has low time and space complexity.

Key words:data anonymization;data stream;clustering

参考文献

[1] Wong RCW, Fu AWC, Wang K, Pei J. Anonymization-Based attacks in privacy-preserving data publishing. ACM Trans. onDatabase Systems, 2009,34(2):1-46. [doi: 10.1145%2f1538909.1538910]

[2] LeFevre K, DeWitt DJ, Ramarkrishnan R. Incognito: Efficient full-domain k-anonymity. In: Proc. of the SIGMOD 2005. ACMPress, 2005. 49-60. [doi: 10.1145/1066157.1066164]

[3] Fung BCM, Yu PS. Top-Down specialization for information and privacy preservation. In: Proc. of the ICDE 2005. IEEEComputer Society, 2005. 205-216. [doi: 10.1109/ICDE.2005.143]

[4] Bayardo RJ, Agrawal R. Data privacy through optimal k-anonymization. In: Proc. of the ICDE 2005. IEEE Computer Society, 2005.217-228. [doi: 10.1109/ICDE.2005.42]

[5] LeFevre K, DeWitt DJ, Ramakrishnan R. Mondrian multidimensional k-anonymity. In: Proc. of the ICDE 2006. IEEE ComputerSociety, 2006. 25-25. [doi: 10.1109/ICDE.2006.101]

[6] Fung BCM, Wang K, Wang L, Hung PCK. Privacy-Preserving data publishing for cluster analysis. Data & Knowledge Engineering,2009,68(6):552-575. [doi: 10.1016/j.datak.2008.12.001]

[7] Wang ZH, Xu J, Wang W, Shi BL. Clustering-Based approach for data anonymization. Ruan Jian Xue Bao/Journal of Software,2010,21(4):680-693 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3508.htm [doi: 10.3724/SP.J.1001.2010.03508]

[8] Sweeney L. k-Anonymity: A model for protecting privacy. Int’l Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,2002,10(5):557-570. [doi: 10.1142/S0218488502001648]

[9] Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. l-Diversity: Privacy beyond k-anonymity. ACM Trans. onKnowledge Discovery from Data, 2007,1(1):1-52. [doi: 10.1145/1217299.1217300]

[10] Yang N, Tang CJ, Wang Y, Chen Y, Zheng JL. Clustering algorithm on data stream with skew distribution based on temporaldensity. Ruan Jian Xue Bao/Journal of Software, 2010,21(5):1031-1041 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3470.htm [doi: 10.3724/SP.J.1001.2010.03470]

[11] Li FF, Sun JM, Papadimitriou S, Mihaila GA, Stanoi I. Hiding in the crowd: Privacy preservation on evolving streams throughcorrelation tracking. In: Proc. of the ICDE 2007. IEEE Computer Society, 2007. 686-695. [doi: 10.1109/ICDE.2007. 367914]

[12] Cao JM, Carminati B, Ferrari E, Tan K. CASTLE: Continuously anonymizing data streams. IEEE Trans. on Dependable andSecure Computing, 2011,8(3):337-352. [doi: 10.1109/TDSC.2009.47]

[13] Zhou B, Han Y, Pei J, Jiang B, Tao YF, Jia Y. Continuous privacy preserving publishing of data stream. In: Proc. of the EDBT2009. New York: ACM Press, 2009. 648-659. [doi: 10.1145/1516360.1516435]

[14] Li J Z, Ooi BC, Wang WP. Anonymizing streaming data for privacy protection. In: Proc. of the ICDE 2008. IEEE ComputerSociety, 2008. 1367-1369. [doi: 10.1109/ICDE.2008.4497558]

[15] Wang WP, Li, JZ, Ai CY, Li YS. Privacy protection on sliding window of data streams. In: Proc. of the 2007 Int’l Conf. onCollaborative Computing: Networking, Applications and Worksharing. New York: IEEE Computer Society, 2007. 213-221. [doi:10.1109/COLCOM.2007.4553832]

[16] Zhang JW, Yang J, Zhang JP, Yuan YB. KIDS: k-Anonymization data stream base on sliding window. In: Proc. of 2010 the 2ndInt’l Conf. on Future Computer and Communication. IEEE Computer Society, 2010. V2-311-V2-316. [doi: 10.1109/ICFCC.2010.5497420]

[17] Wang P, Lu JJ, Zhao L, Yang JW. B-CASTLE: An efficient publishing algorithm for k-anonymizing data streams. In: Proc. of2010 the 2nd WRI Global Congress on Intelligent Systems. IEEE Computer Society, 2010. 132-136. [doi: 10.1109/GCIS.2010.196]

[18] Zakerzadeh H, Osborn SL. FAANST: Fast anonymizing algorithm for numerical streaming data. In: Proc. of the 5th Int’lWorkshop on Data Privacy Management and 3rd Int’l Conf. on Autonomous Spontaneous Security. Springer-Verlag, 2011. 36-50.[doi: 10.1007/978-3-642-19348-4_4]

[19] Meyerson A, Williams R. On the complexity of optimal k-anonymity. In: Proc. of the 23rd ACM SIGMOD-SIGACT-SIGARTSymp. on Principles of Database Systems. ACM Press, 2004. 223-228. [doi: 10.1145/1055558.1055591]

[20] Atzori M. Weak k-anonymity: A low-distortion model for protecting privacy. In: Proc. of the Information Security Conf. 2006.Springer-Verlag, 2006. 60-71. [doi: 10.1007/11836810_5]

[21] Iyengar VS. Transforming data to satisfy privacy constraints. In: Proc. of the ACM KDD 2002. New York: ACM Press, 2002.279-288. [doi: 10.1145/775047.775089]

[22] Frank A, Asuncion A. UCI machine learning repository. Irvine: School of Information and Computer Science, University ofCalifornia, 2010. http://archive.ics.uci.edu/ml

引用本文

郭昆,张岐山.基于聚类的快速数据流匿名方法.软件学报,2013,24(8):1852-1867

复制

文章指标

点击次数:3706
下载次数: 5396
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2011-07-29
最后修改日期:2012-03-23
录用日期:
在线发布日期: 2013-07-26
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码