联邦学习贡献评估综述
作者:
作者简介:

王勇(1996-),男,博士生,CCF学生会员,主要研究领域为联邦学习,时空数据管理与应用;李开宇(1992-),男,博士,主要研究领域为近似查询,数据集成与众包;李国良(1981-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库,大数据分析和挖掘,群体计算.

通讯作者:

李国良,liguoliang@tsinghua.edu.cn

基金项目:

国家自然科学基金(61925205);北京国家信息研究中心资助项目


Survey on Contribution Evaluation for Federated Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [81]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    数据不动的联邦学习框架是多个数据持有方合作训练机器学习模型的新范式.多个数据持有方参与联邦学习时的贡献评估是联邦学习的核心问题之一.参与方贡献评估需要兼顾有效性、公平性和合理性等要素,在理论方法与实际应用中均面临多项挑战.贡献评估首先需要明确如何度量数据价值,然而数据估值存在主观性与依赖于实际任务场景的特点,如何设计有效、可靠并对恶意数据鲁棒的数据估值指标是第一大挑战.其次,联邦学习合作中的参与方贡献评估是经典的合作博弈问题,如何制定公平合理的参与方贡献评估方案,实现参与方一致认可的博弈平衡是第二大挑战.最后,参与方贡献评估往往计算复杂度高,同时,联邦学习中围绕模型的数据估值时间开销大,因此,在实践中如何设计高效且准确的近似算法是第三大挑战.近年来,为了有效地解决上述挑战,学术界对联邦学习中的贡献评估问题展开了广泛的研究.首先,简要介绍联邦学习与参与方贡献评估的背景知识;然后,综述数据估值指标、参与方贡献评估方案和相关优化技术;最后,讨论了联邦学习贡献评估仍面临的挑战并展望未来研究的发展方向.

    Abstract:

    Federated learning is a collaborative machine learning framework with multiple participants whose training datasets are kept locally. How to evaluate the corresponding data contribution of each participant is one of the critical problems of federated learning. However, contribution evaluation in federated learning faces multiple challenges. First, to evaluate participant contribution, data value needs to be quantified, however, data valuation is challenging because it is subjective, task context-dependent, and vulnerable to malicious data. Second, participant contribution evaluation is a classic cooperative game problem, and a fair yet rational cooperative contribution evaluation scheme is needed to achieve an optimal equilibrium among all participants. Third, contribution evaluation schemes often involve exponential computational complexity, where data valuation by training models in federated learning is also quite time consuming. In recent years, researchers have conducted extensive studies on participant contribution evaluation in federated learning to tackle the above challenges. This study first introduces the background knowledge of federated learning and contribution evaluation. Then, data valuation metrics, contribution evaluation schemes, and corresponding optimization technologies are surveyed successively. Finally, the remaining challenges of contribution evaluation and potential future work are discussed.

    参考文献
    [1] Economist T.The World's Most Valuable Resource is No Longer Oil, But Data.New York:The Economist, 2017.
    [2] Liu Y, Fan T, Chen T, Xu Q, Yang Q.Fate:An industrial grade platform for collaborative learning with data protection.Journal of Machine Learning Research, 2021, 22(226):1-6.
    [3] Yang Q, Liu Y, Chen T, Tong Y.Federated machine learning:Concept and applications.ACM Trans.on Intelligent Systems and Technology (TIST), 2019, 10(2):1-19.
    [4] Ritzberger K, et al.Foundations of Non-cooperative Game Theory.Oxford University Press, 2002.
    [5] Jain A, Patel H, Nagalapatti L, Gupta N, Mehta S, Guttula S, Mujumdar S, Afzal S, Sharma Mittal R, Munigala V.Overview and importance of data quality for machine learning tasks.In:Proc.of the 26th ACM SIGKDD Int'l Conf.on Knowledge Discovery & Data Mining.2020.3561-3562.
    [6] Wahab OA, Mourad A, Otrok H, Taleb T.Federated machine learning:Survey, multi-level classification, desirable criteria and future directions in communication and networking systems.IEEE Communications Surveys & Tutorials, 2021, 23(2):1342-1397.
    [7] Zhan Y, Zhang J, Hong Z, et al.A survey of incentive mechanism design for federated learning.IEEE Trans.on Emerging Topics in Computing, 2021, 10(2):1035-1044.
    [8] Zeng R, Zeng C, Wang X, Li B, Chu X.A comprehensive survey of incentive mechanism for federated learning.arXiv:2106.15406, 2021.
    [9] Pei J.A survey on data pricing:From economics to data science.IEEE Trans.on Knowledge and Data Engineering, 2020, 34(10):4586-4608.
    [10] Cong Z, Luo X, Jian P, Zhu F, Zhang Y.Data pricing in machine learning pipelines.arXiv:2108.07915, 2021.
    [11] Batini C, Cappiello C, Francalanci C, Maurino A.Methodologies for data quality assessment and improvement.ACM Computing Surveys (CSUR), 2009, 41(3):1-52.
    [12] Gupta N, Mujumdar S, Patel H, Masuda S, Panwar N, Bandyopadhyay S, Mehta S, Guttula S, Afzal S, Sharma Mittal R, et al.Data quality for machine learning tasks.In:Proc.of the 27th ACM SIGKDD Conf.on Knowledge Discovery & Data Mining.2021.4040-4041.
    [13] Mothukuri V, Parizi RM, Pouriyeh S, Huang Y, Dehghantanha A, Srivastava G.A survey on security and privacy of federated learning.Future Generation Computer Systems, 2021, 115:619-640.
    [14] Jia R, Wu F, Sun X, Xu J, Dao D, Kailkhura B, Zhang C, Li B, Song D.Scalability vs.utility:Do we have to sacrifice one for the other in data importance quantification? In:Proc.of the IEEE/CVF Conf.on Computer Vision and Pattern Recognition.2021.8239-8247.
    [15] Huang J, Talbi R, et al.An exploratory analysis on users' contributions in federated learning.In:Proc.of the 2nd IEEE Int'l Conf.on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA).2020.20-29.
    [16] McMahan B, Moore E, Ramage D, Hampson S, Arcas BA.Communication-efficient learning of deep networks from decentralized data.In:Proc.of the Artificial Intelligence and Statistics.2017.1273-1282.
    [17] Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R, et al.Advances and open problems in federated learning.Foundations and Trends® in Machine Learning, 2021, 14(1-2):1-210.
    [18] Ma S, Cao Y, Xiong L.Transparent contribution evaluation for secure federated learning on blockchain.In:Proc.of the IEEE 37th Int'l Conf.on Data Engineering Workshops (ICDEW).2021.88-91.
    [19] Cai H, Rueckert D, Passerat-Palmbach J.2CP:Decentralized protocols to transparently evaluate contributivity in blockchain federated learning environments.arXiv:2011.07516, 2020.
    [20] Shi Y, Yu H, Leung C.A survey of fairness-aware federated learning.arXiv:2111.01872, 2021.
    [21] Asad M, Moustafa A, et al.A critical evaluation of privacy and security threats in federated learning.Sensors, 2020, 20(24):Article No.7182.[doi:10.3390/s20247182]
    [22] Liu Y, Kang Y, Xing C, et al.A secure federated transfer learning framework.IEEE Intelligent Systems, 2020, 35(4):70-82.
    [23] Yang T, Andrew G, Eichner H, Sun H, Li W, Kong N, Ramage D, Beaufays F.Applied federated learning:Improving google keyboard query suggestions.arXiv:1812.02903, 2018.
    [24] Zhang T, Gao L, He C, Zhang M, Krishnamachari B, Avestimehr S.Federated learning for internet of things:Applications, challenges, and opportunities.arXiv:2111.07494, 2021.
    [25] Branzei R, Dimitrov D, Tijs S.Models in Cooperative Game Theory.Springer, 2008.
    [26] Chen Y, Yang X, Qin X, et al.Dealing with label quality disparity in federated learning.In:Proc.of the Federated Learning.2020.108-121.
    [27] Liu YX, Chen H, Liu YH, Li CP.Privacy-preserving techniques in federated learning.Ruan Jian Xue Bao/Journal of Software, 2022, 33(3):1057-1092(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/6446.htm[doi:10.13328/j.cnki.jos.006446]
    [28] Feng S, Niyato D, Wang P, Kim DI, Liang YC.Joint service pricing and cooperative relay communication for federated learning.In:Proc.of the Int'l Conf.on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).2019.815-820.
    [29] Kang J, Xiong Z, Niyato D, Xie S, Zhang J.Incentive mechanism for reliable federated learning:A joint optimization approach to combining reputation and contract theory.IEEE Internet of Things Journal, 2019, 6(6):10700-10714.
    [30] Pandey SR, Tran NH, Bennis M, Tun YK, Manzoor A, Hong CS.A crowdsourcing framework for on-device federated learning.IEEE Trans.on Wireless Communications, 2020, 19(5):3241-3256.
    [31] Banica T, Curran S.Decomposition results for Gram matrix determinants.Journal of Mathematical Physics, 2010, 51(11):Article No.113503.
    [32] Xu X, Wu Z, Foo CS, et al.Validation free and replication robust volume-based data valuation.In:Advances in Neural Information Processing Systems 34.2021.10837-10848.
    [33] Zhao B, Liu X, Chen W.When crowdsensing meets federated learning:Privacy-preserving mobile crowdsensing system.arXiv:2102.10109, 2021.
    [34] Xu X, Lyu L, Ma X, et al.Gradient driven rewards to guarantee fairness in collaborative machine learning.In:Advances in Neural Information Processing Systems 34.2021.16104-16117.
    [35] Lv H, Zheng Z, Luo T, Wu F, Tang S, Hua L, Jia R, Lv C.Data-free evaluation of user contributions in federated learning.In:Proc.of the 19th Int'l Symp.on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt).2021.1-8.
    [36] Sim RHL, Zhang Y, Chan MC, Low BKH.Collaborative machine learning with incentive-aware model rewards.In:Proc.of the Int'l Conf.on Machine Learning.2020.8927-8936.
    [37] Agarwal N, Suresh AT, Yu FXX, Kumar S, McMahan B.CPSGD:Communication-efficient and differentially-private distributed SGD.Advances in Neural Information Processing Systems 31.2018.
    [38] Campen T, Hamers H, Husslage B, Lindelauf R.A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack.Social Network Analysis and Mining, 2018, 8(1):1-12.
    [39] Wang G, Dang CX, Zhou Z.Measure contribution of participants in federated learning.In:Proc.of the IEEE Int'l Conf.on Big Data (Big Data).2019.2597-2604.
    [40] Shyn SK, Kim D, Kim K.FedCCEA:A practical approach of client contribution evaluation for federated learning.arXiv:2106.02310, 2021.
    [41] Koh PW, Liang P.Understanding black-box predictions via influence functions.In:Proc.of the Int'l Conf.on Machine Learning.PMLR, 2017.1885-1894.
    [42] Yan T, Procaccia AD.If you like shapley then you'll love the core.In:Proc.of the AAAI Conf.on Artificial Intelligence, Vol.35.2021.5751-5759.
    [43] Jøsang A.Subjective Logic.Cham:Springer, 2016.
    [44] Lin J, Du M, Liu J.Free-riders in federated learning:Attacks and defenses.arXiv:1911.12560, 2019.
    [45] Miller N, Resnick P, et al.Eliciting informative feedback:The peer-prediction method.Management Science, 2005, 51(9):1359-1373.
    [46] Dasgupta A, Ghosh A.Crowdsourced judgement elicitation with endogenous proficiency.In:Proc.of the 22nd Int'l Conf.on World Wide Web.2013.319-330.
    [47] Zhao J, Zhu X, Wang J, Xiao J.Efficient client contribution evaluation for horizontal federated learning.In:Proc.of the IEEE Int'l Conf.on Acoustics, Speech and Signal Processing (ICASSP 2021).IEEE, 2021.3060-3064.
    [48] Yoon J, Arik S, Pfister T.Data valuation using reinforcement learning.In:Proc.of the Int'l Conf.on Machine Learning.PMLR, 2020.10842-10851.
    [49] Cook RD, Weisberg S.Characterizations of an empirical influence function for detecting influential cases in regression.Technometrics, 1980, 22(4):495-508.
    [50] Richardson A, Filos-Ratsikas A, et al.Rewarding high-quality data via influence functions.arXiv:1908.11598, 2019.
    [51] Kearns M, Ron D.Algorithmic stability and sanity-check bounds for leave-one-out cross-validation.Neural computation, 1999, 11(6):1427-1453.
    [52] Shapley LS.A value for n-person games.Annals of Mathematical Studies, 1953, 28:307-317.
    [53] Dubey P.On the uniqueness of the shapley value.Int'l Journal of Game Theory, 1975, 4(3):131-139.
    [54] Jia R, Dao D, Wang B, Hubis FA, Hynes N, Gürel NM, Li B, Zhang C, Song D, Spanos CJ.Towards efficient data valuation based on the shapley value.In:Proc.of the 22nd Int'l Conf.on Artificial Intelligence and Statistics.PMLR, 2019.1167-1176.
    [55] Peleg B, Sudhölter P.Introduction to the Theory of Cooperative Games.Springer, 2007.
    [56] Schmeidler D.The nucleolus of a characteristic function game.SIAM Journal on Applied Mathematics, 1969, 17(6):1163-1170.
    [57] Telser LG.The usefulness of core theory in economics.Journal of Economic Perspectives, 1994, 8(2):151-164.
    [58] Deng X, Papadimitriou CH.On the complexity of cooperative solution concepts.Mathematics of Operations Research, 1994, 19(2):257-266.
    [59] Castro J, Gómez D, Tejada J.Polynomial calculation of the shapley value based on sampling.Computers & Operations Research, 2009, 36(5):1726-1730.
    [60] Balkanski E, Syed U, Vassilvitskii S.Statistical cost sharing.In:Advances in Neural Information Processing Systems 30.2017.
    [61] Maleki S, Tran-Thanh L, Hines G, Rahwan T, Rogers A.Bounding the estimation error of sampling-based shapley value approximation.arXiv:1306.4265, 2013.
    [62] Ghorbani A, Kim M, Zou J.A distributional framework for data valuation.In:Proc.of the Int'l Conf.on Machine Learning.PMLR, 2020.3535-3544.
    [63] Du D, Hwang FK, Hwang F.Combinatorial Group Testing and Its Applications.World Scientific, 2000.
    [64] Liu Z, Chen Y, Yu H, Liu Y, Cui L.GTG-shapley:Efficient and accurate participant contribution evaluation in federated learning.arXiv:2109.02053, 2021.
    [65] Rauhut H.Compressive sensing and structured random matrices.In:Fornasier M, ed.Theoretical Foundations and Numerical Methods for Sparse Recovery, Vol.9.Berlin:De Gruyter, 2010.
    [66] Mitchell R, Cooper J, Frank E, et al.Sampling permutations for shapley value estimation.Journal of Machine Learning Research, 2022, 23(43):1-46.
    [67] Maleki S.Addressing the computational issues of the Shapley value with applications in the smart grid[Ph.D.Thesis].University of Southampton, 2015.
    [68] Kwon Y, Rivas MA, Zou J.Efficient computation and analysis of distributional shapley values.In:Proc.of the Int'l Conf.on Artificial Intelligence and Statistics.PMLR, 2021.793-801.
    [69] Song T, Tong Y, Wei S.Profit allocation for federated learning.In:Proc.of the IEEE Int'l Conf.on Big Data (Big Data).IEEE, 2019.2577-2586.
    [70] Wang T, Rausch J, Zhang C, Jia R, Song D.A principled approach to data valuation for federated learning.In:Proc.of the Federated Learning.Springer, 2020.153-167.
    [71] Fan Z, Fang H, Zhou Z, Pei J, Friedlander MP, Liu C, Zhang Y.Improving fairness for data valuation in federated learning.arXiv:2109.09046, 2021.
    [72] Fan Z, Fang H, Zhou Z, Pei J, Friedlander MP, Zhang Y.Fair and efficient contribution valuation for vertical federated learning.arXiv:2201.02658, 2022.
    [73] Ghorbani A, Zou J.Data shapley:Equitable valuation of data for machine learning.In:Proc.of the Int'l Conf.on Machine Learning.PMLR, 2019.2242-2251.
    [74] Jia R, Dao D, Wang B, Hubis FA, Gurel NM, Li B, Zhang C, Spanos CJ, Song D.Efficient task-specific data valuation for nearest neighbor algorithms.arXiv:1908.08619, 2019.
    [75] Lyu L, Yu J, Nandakumar K, Li Y, Ma X, Jin J, Yu H, Ng KS.Towards fair and privacy-preserving federated deep models.IEEE Trans.on Parallel and Distributed Systems, 2020, 31(11):2524-2541.
    [76] Lyu L, Xu X, Wang Q, Yu H.Collaborative fairness in federated learning.In:Proc.of the Federated Learning.Springer, 2020.189-204.
    [77] Fraboni Y, Vidal R, Lorenzi M.Free-rider attacks on model aggregation in federated learning.In:Proc.of the Int'l Conf.on Artificial Intelligence and Statistics.PMLR, 2021.1846-1854.
    [78] Han X, Wang L, Wu J.Data valuation for vertical federated learning:An information-theoretic approach.arXiv:2112.08364, 2021.
    [79] Li Q, Wen Z, Wu Z, et al.A survey on federated learning systems:Vision, hype and reality for data privacy and protection.IEEE Trans.on Knowledge and Data Engineering, 2021.[doi:10.1109/TKDE.2021.3124599]
    附中文参考文献
    [27] 刘艺璇, 陈红,刘宇涵, 李翠平.联邦学习中的隐私保护技术.软件学报, 2022, 33(3):1057-1092.http://www.jos.org.cn/1000-9825/6446.htm[doi:10.13328/j.cnki.jos.006446]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王勇,李国良,李开宇.联邦学习贡献评估综述.软件学报,2023,34(3):1168-1192

复制
分享
文章指标
  • 点击次数:3741
  • 下载次数: 6829
  • HTML阅读次数: 6356
  • 引用次数: 0
历史
  • 收稿日期:2022-05-15
  • 最后修改日期:2022-09-07
  • 在线发布日期: 2022-10-26
  • 出版日期: 2023-03-06
文章二维码
您是第19758703位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号