Cross-source Data Error Detection Approach Based on Federated Learning

doi:10.13328/j.cnki.jos.006781

微信服务号

微信订阅号

2025-4-9- 9

Home > Archive>Volume 34, Issue 3, 2023 >1126-1147. DOI:10.13328/j.cnki.jos.006781

PDF HTML XML Export Cite reminder

Cross-source Data Error Detection Approach Based on Federated Learning
DOI:
                        10.13328/j.cnki.jos.006781
                    
Author:
                        CHEN LuCHEN Lu
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GUO Yu-XiangGUO Yu-Xiang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GE Cong-CongGE Cong-Cong
Data Intelligence Innovation Lab, Huawei Cloud Computing Technologies Co. Ltd., Hangzhou 310052, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHENG Bai-HuaZHENG Bai-Hua
School of Computing and Information Systems, Singapore Management University, Singapore
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GAO Yun-JunGAO Yun-Jun
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Error detection is crucial for improving data quality, which has attracted a surge of interests from both industry and academia. Various detection methods tailored for a single data source have been proposed. Nevertheless, in many real-world scenarios, data is not centrally stored and managed. Different sources of correlated data can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data is often not allowed to be integrated centrally. To this end, this study proposes FeLeDetect, a cross-source data error detection method based on federated learning. First, a graph-based error detection model (GEDM) is presented to capture sufficient data features from each data source. Then, the study investigates a federated co-training algorithm (FCTA) to collaboratively train GEDM over different data sources without privacy leakage. Furthermore, the study designs a series of optimization methods to reduce the communication cost during the federated learning and the manual labeling efforts. Extensive experiments on three real-life datasets demonstrate that GEDM achieves an average improvement of 10.3% F1-score in the local scenario and 25.2% F1-score in the centralized scenario, outperforming all the five existing state-of-the-art competitors for a single data source; and FeLeDetect further enhances local GEDM in terms of F1-score by 23.2% on average.

Key words:data governance;data quality;error detection;federated learning

Get Citation

陈璐,郭宇翔,葛丛丛,郑白桦,高云君.基于联邦学习的跨源数据错误检测方法.软件学报,2023,34(3):1126-1147

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 13,2022
Revised:September 07,2022
Adopted:
Online: October 26,2022
Published: March 06,2023

You are the first2034058Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History