[关键词]
[摘要]
大数据时代丰富的信息来源促进了机器学习技术的蓬勃发展,然而机器学习模型的训练集在数据采集、模型训练等各个环节中存在的隐私泄露风险,为人工智能环境下的数据管理提出了重大挑战.传统数据管理中的隐私保护方法无法满足机器学习中多个环节、多种场景下的隐私保护要求.分析并展望了机器学习技术中隐私攻击与防御的研究进展和趋势.首先介绍了机器学习中隐私泄露的场景和隐私攻击的敌手模型,并根据攻击者策略分类梳理了机器学习中隐私攻击的最新研究;介绍了当前机器学习隐私保护的主流基础技术,进一步分析了各技术在保护机器学习训练集隐私时面临的关键问题,重点分类总结了5种防御策略以及具体防御机制;最后展望了机器学习技术中隐私防御机制的未来方向和挑战.
[Key word]
[Abstract]
In the era of big data, a rich source of data prompts the development of machine learning technology. However, risks of privacy leakage of models' training data in data collecting and training stages pose essential challenges to data management in the artificial intelligence age. Traditional privacy preserving methods of data management and analysis could not satisfy the complex privacy problems in various stages and scenarios of machine learning. This study surveys the state-of-the-art works of privacy attacks and defenses in machine learning. On the one hand, scenarios of privacy leakage and adversarial models of privacy attacks are illustrated. Also, specific works of privacy attacks are classified with respect to adversarial strategies. On the other hand, 3 main technologies which are commonly applied in privacy preserving of machine learning are introduced and key problems of their applications are pointed out. In addition, 5 defense strategies and corresponding specific mechanisms are elaborated. Finally, future works and challenges of privacy preserving in machine learning are concluded.
[中图分类号]
[基金项目]
国家重点研发计划(2018YFB1004401);国家自然科学基金(61532021,61772537,61772536,61702522)