Abstract:In multi-label learning, each sample is associated with multiple labels. The key task is how to use the correlation between labels when building the model. Multi-label deep forest (MLDF) algorithm attempts to mine the correlation between labels by using layer-by-layer representation learning under the framework of deep ensemble learning and use the obtained label probability representation to improve prediction accuracy. However, on the one hand, the label probability representation is highly correlated with the label information, which will lead to its low diversity. As the depth of the deep forest increases, the performance will decline. On the other hand, the calculation of label probability requires the storage of forest structures with all layers and the application of these structures one by one in the test stage, which will cause unbearable computational and storage overhead. To solve these problems, this study proposes interaction-representation-based MLDF (iMLDF). iMLDF mines the structural information in the feature space from the decision path of the forest model, extracts the feature interaction in the decision tree path by using the random interaction trees, and obtains two interaction representations of feature confidence score and label probability distribution, respectively. On the one hand, iMLDF makes full use of the feature structural information in the forest model to enrich the relevant information between labels. On the other hand, it calculates all the representations through interaction expressions so that the algorithm does not need to store all the forest structures, which greatly improves computational efficiency. The experimental results show that iMLDF algorithm achieves better prediction performance, and the computational efficiency is improved by an order of magnitude compared with MLDF for datasets with massive samples.