Abstract:Traditional methods usually adopt machine learning algorithms for data cleaning. Although these methods can solve some problems, there still are computational difficulties, lack of sufficient knowledge, and other limitations. In recent years, with the rise of the crowdsourcing, more and more research has introduced crowdsourcing into the process of data cleaning, providing the extra knowledge needed for machine learning. Since workers on the crowdsourcing platforms require to be paid, it is essential to study how to effectively combine machine learning algorithms with crowdsourcing on a limited budget. This study proposes two active learning models to support crowdsourcing-enhanced data cleaning. By using active learning technology to reduce crowdsourcing cost, data cleaning based on real crowdsourcing platform is realized for given data sets, which can reduce cost and improve data quality at the same time. Experimental results on the real-world datasets show the effectiveness of the proposed methods.