机器遗忘综述

doi:10.13328/j.cnki.jos.007237

微信服务号

微信订阅号

2025年8月5日 1:01 星期二

首页 > 过刊浏览>2025年第36卷第4期 >1637-1664. DOI:10.13328/j.cnki.jos.007237

PDF HTML阅读 XML下载导出引用引用提醒

机器遗忘综述
DOI:
                        10.13328/j.cnki.jos.007237
                    
CSTR:
                        32375.14.jos.007237
                    
作者:
                        李梓童李梓童
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
孟小峰孟小峰
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
王雷霞王雷霞
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
郝新丽郝新丽
中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61941121, 91846204, 6217242)

Survey on Machine Unlearning

Author:

LI Zi-Tong
LI Zi-Tong
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
MENG Xiao-Feng
MENG Xiao-Feng
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Lei-Xia
WANG Lei-Xia
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
HAO Xin-Li
HAO Xin-Li
School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来, 机器学习在人们日常生活中应用愈发广泛, 这些模型在历史数据上进行训练, 预测未来行为, 极大地便利了人们生活. 然而, 机器学习存在隐私泄露隐患: 当用户不希望个人数据被使用时, 单纯地把其数据从训练集中删去并不够, 已训练好的模型仍包含用户信息, 可能造成隐私泄露. 为了解决这一问题, 让机器学习模型“遗忘”该用户个人数据, 最简单的方法是在不包含其数据的训练集上重新训练, 此时得到的新模型必定不包含个人数据的信息. 然而, 重新训练往往代价较大, 成本较高, 由此产生“机器遗忘”的关键问题: 能否以更低的代价, 获取与重新训练模型尽可能相似的模型. 对研究这一问题的文献进行梳理归纳, 将已有机器遗忘方法分为基于训练的方法、基于编辑的方法和基于生成的方法这3类, 介绍机器遗忘的度量指标, 并对已有方法进行测试和评估, 最后对机器遗忘作未来展望.

关键词:机器学习;机器遗忘;深度学习;隐私保护

Abstract:

Machine learning has become increasingly prevalent in daily life. Various machine learning methods are proposed to utilize historical data for making predictions, making people’s life more convenient. However, there is a significant challenge associated with machine learning-privacy leakage. Mere deletion of a user’s data from the training set is not sufficient for avoiding privacy leakage, as the trained model may still harbor this information. To tackle this challenge, the conventional approach entails retraining the model on a new training set that excludes the data of the user. However, this method can be costly, prompting the exploration for a more efficient way to “unlearn” specific data while yielding a model comparable to a retrained one. This study summarizes the current literature on this topic, categorizing existing unlearning methods into three groups: training-based, editing-based, and generation-based methods. Additionally, various metrics are introduced to assess unlearning methods. The study also evaluates current unlearning methods in deep learning and concludes with future research directions in this field.

Key words:machine learning;machine unlearning;deep learning;privacy protection

引用本文

李梓童,孟小峰,王雷霞,郝新丽.机器遗忘综述.软件学报,2025,36(4):1637-1664

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-03-17
最后修改日期:2024-04-29
录用日期:
在线发布日期: 2024-11-18
出版日期: 2025-04-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码