数据库中的知识隐藏
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60403041 (国家自然科学基金)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [44]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    伴随着数据共享、隐私保护、知识发现等多重需求而产生的PPDM(privacy preserving data mining),成为数据挖掘和信息安全领域近几年来的研究热点.PPDM中主要考虑两个层面的问题:一是敏感数据的隐藏与保护;二是数据中蕴涵的敏感知识的隐藏与保护(knowledge hiding in database,简称KHD).对目前的KHD技术进行分类和综述.首先介绍KHD产生的背景,然后着重讨论敏感关联规则隐藏技术和分类规则隐藏技术,接着探讨KHD方法的评估指标,最后归结出KHD后续研究的3个方向:数据修改技巧中基于目标距离的优化测度函数设计、数据重构技巧中的反向频繁项集挖掘以及基于数据抽样技巧的通用知识隐藏方法设计.

    Abstract:

    Motivated by the multiple requirements of data sharing,privacy preserving and knowledge discovery, privacy preserving data mining(PPDM)has become the research hotspot in data mining and information security fields.Two main problem are addressed in PPDM:One is the protection of sensitive raw data;the other is the protection of sensitive knowledge contained in the data,which is also called knowledge hiding in database(KHD). This paper gives a survey on the current KHD techniques.It first introduces the background in which KHD appears. Then it mainly presents the techniques on sensitive association rule hiding and classification rule hiding.Evaluation of KHD methods is discussed after that.Finally,it points out three future research directions of KHD:Design of measure function based on target distance in data modification techniques,inverse frequent set mining in data reconstruction techniques and design of general KHD method based on data sampling.

    参考文献
    [1]Verykios VS,Bertino E,Fovino IN,Provenza LP,Saygin Y,Theodoridis Y.State-of-the-Art in privacy preserving data mining.SIGMOD Record,2004,33(1):50-57.
    [2]Johnsten T,Raghavan V.A methodology for hiding knowledge in databases.In:Clifton C,Estivill-Castro V,eds.Proc.of the IEEE ICDM Workshop on Privacy,Security and Data Mining.Maebashi:Australian Computer Society,2002.9-17.
    [3]Atallah M,Bertino E,Elmagarmid A,Ibrahim M,Verykios VS.Disclosure limitation of sensitive rules.In:Scheuermann P,ed.Proc.of the IEEE Knowledge and Data Exchange Workshop (KDEX'99).Chicago:IEEE Computer Society,1999.45-52.
    [4]O'Leary DE.Knowledge discovery as a threat to database security.In:Piatetsky-Shapiro G,Frawley WJ,eds.Knowledge Discovery in Databases.Menlo Park:AAAI Press; Cambridge:MIT Press,1991.507-516.
    [5]Clifton C,Marks D.Security and privacy implications of data mining.In:Han JW,Lakshmanan LVS,Ng R,eds.Proc.of the ACM SIGMOD Workshop Data Mining and Knowledge Discovery.Vancouver:University of British Columbia,1996.15-19.
    [6]Chang L,Moskowitz IS.Bayesian methods applied to the database inference problem.In:Jajodia S,ed.Proc.of the 12th Annual IFIP WG 11.3 Working Conf.on Database Security.Deventer:Kluwer Academic Publisher,1998.237-251.
    [7]Chang L,Moskowitz IS.An integrated framework for database privacy protection.In:Thuraisingham BM,van de Riet RP,Dittrich KR,Tari Z,eds.Proc.of the 14th Annual IFIP WG 11.3 Working Conf.on Database Security.Deventer:Kluwer Academic Publisher,2000.161-172.
    [8]Dasseni E,Verykios VS,Elmagarmid A,Bertino E.Hiding association rules by using confidence and support.In:Moskowitz IS,ed.Proc.of the 4th Int'l Information Hiding Workshop (IHW 2001).Berlin:Springer-Verlag,2001.369-383.
    [9]Verykios VS,Elmagarmid A,Bertino E,Saygin Y,Dasseni E.Association rule hiding.IEEE Trans.on Knowledge and Data Engineering,2004,16(4):434-447.
    [10]Oliveira SRM,Za(i)ane OR.Privacy preserving frequent itemset mining.In:Clifton C,Estivill-Castro V,eds.Proc.of the IEEE ICDM Workshop on Privacy,Security and Data Mining.Maebashi:Australian Computer Society,2002.43-54.
    [11]Oliveira SRM,Za(i)ane OR.Algorithms for balancing privacy and knowledge discovery in association rule mining.In:Desai BC,Ng W,eds.Proc.of the 7th Int'l Database Engineering and Applications Symp.Hong Kong:IEEE Computer Society,2003.54-63.
    [12]Oliveira SRM,Za(i)ane OR.Protecting sensitive knowledge by data sanitization.In:Wu XD,Tuzhilin A,eds.Proc.of the 3rd IEEE Int'l Conf.on Data Mining (ICDM 2003).Melbourne:IEEE Computer Society,2003.613-616.
    [13]Sun X,Yu PS.A border-based approach for hiding sensitive frequent itemsets.In:Han JW,Wah BW,Raghavan V,Wu XD,Rastogi R,eds.Proc.of the 5th IEEE Int'l Conf.on Data Mining (ICDM 2005).Houston:IEEE Computer Society,2005.426-433.
    [14]Menon S,Sarkar S,Mukherjee S.Maximizing accuracy of shared databases when concealing sensitive patterns.Information Systems Research,2005,16(3):256-270.
    [15]Gkoulalas-Divanis A,Verykios VS.An integer programming approach for frequent itemset hiding.In:Yu PS,Tsotras VJ,Fox EA,Liu B,eds.Proc.of the ACM 15th Conf.on Information and Knowledge Management.Arlington:ACM Press,2006.748-757.
    [16]Lee G,Chang CY,Chen ALP.Hiding sensitive patterns in association rules mining.In:Wong E,Kanoun K,eds.Proc.of the 28th Int'l Computer Software and Applications Conf.(COMPSAC 2004).Piscataway:IEEE Computer Society,2004.424-429.
    [17]Wang ET,Lee G,Lin YT.A novel method for protecting sensitive knowledge in association rules mining.In:Chen IR,Ibbett R,Mei H,eds.Proc.of the 29th Annual Int'l Computer Software and Applications Conf.(COMPSAC 2005).Edinburgh:IEEE Computer Society,2005.511-516.
    [18]Oliveira SRM,Za(i)ane OR,Saygin Y.Secure association rule sharing.In:Dai H,Srikant R,Zhang C,eds.Proc.of the 8th Pacific-Asia Conf.on Knowledge Discovery and Data Mining (PAKDD 2004).Berlin:Springer-Verlag,2004.74-85.
    [19]Oliveira SRM,Za(i)ane OR.A unified framework for protecting sensitive association rules in business collaboration.Int'l Journal of Business Intelligence and Data Mining,2006,1(3):247-287.
    [20]Pontikakis ED,Verykios VS,Theodoridis Y.On the comparison of association rule hiding heuristics.In:Proc.of the 3rd Hellenic Data Management Symp.(HDMS 2004).2004.
    [21]Pontikakis ED,Tsitsonis A,Verykios VS.An experimental study of distortion-based techniques for association rule hiding.In:Farkas C,Samarati P,eds.Proc.of the 18th Annual IFIP WG 11.3 Working Conf.on Data and Applications Security (DBSec 2004).Sitges:Kluwer Academic Publisher,2004.325-339.
    [22]Zhan JY.A study for association rule hiding using the evaluation of side-effect cost[MS.Thesis].Tainan:University of Tainan,2005 (in Chinese with English abstract).
    [23]Wang SL,Lee YH,Billis S,Jafari A.Hiding sensitive items in privacy preserving association rule mining.In:Wieringa P,ed.Proc.of the IEEE Int'l Conf.on Systems,Man and Cybernetics (SMC 2004).New York:IEEE,2004.3239-3244.
    [24]Zhang W,Chen Y,Zou HB,Zhou T.Boolean rule hiding algorithm based on inverted file.Computer Engineering,2005,31(14):97-98 (in Chinese with English abstract).
    [25]Saygin Y,Verykios VS,Clifton C.Using unknowns to prevent discovery of association rules.SIGMOD Record,2001,30(4):45-54.
    [26]Hintoglu AA,Inan A,Saygin Y,Keskinoz M.Suppressing data sets to prevent discovery of association rules.In:Han JW,Wah BW,Raghavan V,Wu XD,Rastogi R,eds.Proc.of the 5th IEEE Int'l Conf.on Data Mining (ICDM 2005).Houston:IEEE Computer Society,2005.645-648
    [27]Wang SL,Jafari A.Using unknowns for hiding sensitive predictive association rules.In:Zhang D,Khoshgoftaar TM,Shyu ML,eds.Proc.of the IEEE Int'l Conf.on Information Reuse and Integration.IEEE Systems,Man and Cybernetics Society,2005.223-228.
    [28]Chen X,Orlowska M,Li X.A new framework of privacy preserving data sharing.In:Matwin S,Adams C,Chang LW,Zhan J,eds.Proc.of the IEEE ICDM Workshop on Privacy and Security Aspects of Data Mining.Brighton:IEEE Computer Society,2004.47-56.
    [29]Natwichai J,Li X,Orlowska M.Hiding classification rules for data sharing with privacy preservation.In:Tjoa AM,Trujillo J,eds.Proc.of the 7th Int'l Conf.on Data Warehousing and Knowledge Discovery.LNCS 3589,Berlin:Springer-Verlag,2005.468-477.
    [30]Natwichai J,Li X,Orlowska M.A reconstruction-based algorithm for classification rules hiding.In:Dobbie G,Bailey J,eds.Proc.of the 17th Australasian Database Conf.(ADC 2006).Hobart:Australian Computer Society,2006.49-58.
    [31]Clifton C.Using sample size to limit exposure to data mining.Journal of Computer Security,2000,8(4):281-307.
    [32]Clifton C.Protecting against data mining through samples.In:Atluri V,Hale J,eds.Proc.of the 13th Annual IFIP WG 11.3 Working Conf.on Database Security.Deventer:Kluwer Academic Publisher,1999.193-207.
    [33]Chen ZX.Privacy preserving of sequential pattern mining[MS.Thesis].Taizhong:Providence University,2006 (in Chinese with English abstract).
    [34]Mielikainen T.On inverse frequent set mining.In:Clifton C,Du WL,eds.Proc.of the IEEE ICDM Workshop on Privacy Preserving Data Mining.Melbourne:IEEE Computer Society,2003.18-23.
    [35]Calders T.Computational complexity of itemset frequency satisfiability.In:Deutsch A,ed.Proc.of the 23rd ACM SIGMOD-SIGACT-SIGART Symp.on Principles of Database Systems (PODS).Paris:ACM Press,2004.143-154.
    [36]Wu XT,Wu,Y,Wang YG,Li YJ.Privacy-Aware market basket data set generation:A feasible approach for inverse frequent set mining.In:Kargupta H,Srivastava J,Kamath C,Goodman A,eds.Proc.of the 5th SIAM Int'l Conf.on Data Mining.Newport Beach:SIAM,2005.103-114.
    [37]Wang YG,Wu XT.Approximate inverse frequent itemset mining:Privacy,complexity,and approximation.In:Han JW,Wah BW,Raghavan V,Wu XD,Rastogi R,eds.Proc.of the 5th IEEE Int'l Conf.on Data Mining (ICDM 2005).Houston:IEEE Computer Society,2005.482-489.
    [38]Guo YH,Tong YH,Tang SW,Yang DQ.A FP-tree-based method for inverse frequent set mining.In:Bell D,Hong J,eds.Proc.of the 23rd British National Conf.on Databases (BNCOD 2006).LNCS 4042,Berlin:Springer-Verlag,2006.152-163.
    [39]Chen X,Orlowska M.A further study on inverse frequent set mining.In:Li X,Wang S,Dong ZY,eds.Proc.of the 1st Int'l Conf.on Advanced Data Mining and Applications (ADMA).LNCS 3584,Berlin:Springer-Verlag,2005.753-760.
    [40]Zhang P,Tong YH,Tang SW,Yang DQ,Ma XL.An effective method for privacy preserving association rule mining.Journal of Software,2006,17(8):1764-1774 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/17/1764.htm
    [22]詹景逸.运用边际效应成本评估之关联法则隐藏演算法研究[硕士学位论文].台南:台南大学,2005.
    [24]张伟,陈芸,邹汉斌,周霆.基于倒排文件的布尔规则隐藏算法.计算机工程,2005,31(14):97-98.
    [33]陈肇勋.序列样式探勘的隐私权保护[硕士学位论文].台中:静宜大学,2006.
    [40]张鹏,童云海,唐世渭,杨冬青,马秀莉.一种有效的隐私保护关联规则挖掘方法.软件学报,2006,17(8):1764-1774.http://www.jos.org.cn/1000-9825/17/1764.htm
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

郭宇红,童云海,唐世渭,杨冬青.数据库中的知识隐藏.软件学报,2007,18(11):2782-2799

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-01-10
  • 最后修改日期:2007-05-10
文章二维码
您是第19728372位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号