Cross-source Data Error Detection Approach Based on Federated Learning
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Error detection is crucial for improving data quality, which has attracted a surge of interests from both industry and academia. Various detection methods tailored for a single data source have been proposed. Nevertheless, in many real-world scenarios, data is not centrally stored and managed. Different sources of correlated data can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data is often not allowed to be integrated centrally. To this end, this study proposes FeLeDetect, a cross-source data error detection method based on federated learning. First, a graph-based error detection model (GEDM) is presented to capture sufficient data features from each data source. Then, the study investigates a federated co-training algorithm (FCTA) to collaboratively train GEDM over different data sources without privacy leakage. Furthermore, the study designs a series of optimization methods to reduce the communication cost during the federated learning and the manual labeling efforts. Extensive experiments on three real-life datasets demonstrate that GEDM achieves an average improvement of 10.3% F1-score in the local scenario and 25.2% F1-score in the centralized scenario, outperforming all the five existing state-of-the-art competitors for a single data source; and FeLeDetect further enhances local GEDM in terms of F1-score by 23.2% on average.

    Reference
    Related
    Cited by
Get Citation

陈璐,郭宇翔,葛丛丛,郑白桦,高云君.基于联邦学习的跨源数据错误检测方法.软件学报,2023,34(3):1126-1147

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 13,2022
  • Revised:September 07,2022
  • Adopted:
  • Online: October 26,2022
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063