Abstract:Weak label learning is an important sub-branch of multi-label learning which has been widely studied and applied in replenishing missing labels of partially labeled instances or classifying new instances. However, existing weak label learning methods are generally vulnerable to noisy and redundant features in high-dimensional data where multiple labels and missing labels are more likely present. To accurately classify high-dimensional multi-label instances, in this paper, an ensemble weak label classification method is proposed by maximizing dependency between labels and features (EnWL for short). EnWL first repeatedly utilizes affinity propagation clustering in the feature space of high-dimensional data to find cluster centers. Next, it uses the obtained cluster centers to construct representative feature subsets and to reduce the impact of noisy and redundant features. Then, EnWL trains a semi-supervised multi-label classifier by maximizing the dependency between labels and features on each feature subset. Finally, it combines these base classifiers into an ensemble classifier via majority vote. Experimental results on several high-dimensional datasets show that EnWL significantly outperforms other related methods across various evaluation metrics.