Abstract:By selecting parts of base classifiers to combine, ensemble pruning aims to achieve a better generalization and have less prediction time than the ensemble of all base classifiers. While, most of the ensemble pruning algorithms in literature consume much time for classifiers selection. This paper presents a fast ensemble pruning approach: CPM-EP (coverage based pattern mining for ensemble pruning). The algorithm converts an ensemble pruning task into a transaction database process, where the prediction results of all base classifiers for the validation set are organized as a transaction database. For each possible size k, CPM-EP obtains a refined transaction database and builds a FP-Tree to compact it. Next, CPM-EP selects an ensemble of size k. Among the obtained ensembles of all different sizes, the one with the best predictive accuracy for the validation set is output. Experimental results show that CPM-EP reduces computational overhead considerably. The selection time of CPM-EP is about 1/19 that of GASEN and 1/8 that of Forward Selection. Additionally, this approach achieves the best generalization, and the size of the pruned result is small.