Active Learning Approach for Crowdsourcing-enhanced Data Cleaning
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

National Natural Science Foundation of China (U1509216, U1866602, 61472099, 61602129); National Key Research and Development Program (2016YFB1000703); Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provence (LC2016026)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Traditional methods usually adopt machine learning algorithms for data cleaning. Although these methods can solve some problems, there still are computational difficulties, lack of sufficient knowledge, and other limitations. In recent years, with the rise of the crowdsourcing, more and more research has introduced crowdsourcing into the process of data cleaning, providing the extra knowledge needed for machine learning. Since workers on the crowdsourcing platforms require to be paid, it is essential to study how to effectively combine machine learning algorithms with crowdsourcing on a limited budget. This study proposes two active learning models to support crowdsourcing-enhanced data cleaning. By using active learning technology to reduce crowdsourcing cost, data cleaning based on real crowdsourcing platform is realized for given data sets, which can reduce cost and improve data quality at the same time. Experimental results on the real-world datasets show the effectiveness of the proposed methods.

    Reference
    Related
    Cited by
Get Citation

叶晨,王宏志,高宏,李建中.面向众包数据清洗的主动学习技术.软件学报,2020,31(4):1162-1172

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 20,2018
  • Revised:October 08,2018
  • Adopted:
  • Online: April 16,2020
  • Published: April 06,2020
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063