In multi-label learning (MLL) problems, each example is associated with a set of labels. In order to train a well-performed predictor for unseen examples, exploiting relations between labels is crucially important. Most exiting studies simplify the relation as correlations among labels, typically based on their co-occurrence. This study discloses that causal relations are more essential for describing how a label can help another one during the learning process. Based on this observation, two strategies are proposed to generate causal orders of labels from the label causal directed acyclic graph (DAG), following the constraint that the cause label should be prior to the effect label. The main idea of the first strategy is to sort a random order to make it satisfied the cause-effect relations in DAG. And the main idea of the second strategy is to put labels into many non-intersect topological levels based on the structure of the DAG, then sort these labels through their topological structure. Further, by incorporating the causal orders into the classifier chain (CC) model, an effective MLL approach is proposed to exploit the label relation from a more essential view. Experiments results on multiple datasets validate that the extracted causal order of labels indeed provides helpful information to boost the performance.
[1] Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Trans. on Knowledge and Data Engineering, 2013, 26(8): 1819-1837.
[2] Xie MK, Huang SJ. Partial multi-label learning with noisy label identification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021.
[3] Jesse R, Bernhard P, Geo H, et al. Classier chains for multi-label classification. Machine Learning, 2011, 85(3): 333.
[4] Judea P. Causality. Cambridge University Press, 2009.
[5] Jonas P, Dominik J, Bernhard S. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, 2017.
[6] James JH. Causal parameters and policy analysis in economics: A 20th century retrospective. The Quarterly Journal of Economics, 2000, 115(1): 45-97.
[7] Feng JP, Wang XG, Liu WY. Deep graph cut network for weakly-supervised semantic segmentation. Science China Information Sciences, 2021, 64(3): 130105.
[8] Bernhard S, Dominik J, Jonas P, et al. On causal and anticausal learning. arXiv: 1206.6471, 2012.
[9] Peter S, Clark NG, Richard S, et al. Causation, Prediction, and Search. MIT Press, 2000.
[10] Matthew RB, Luo JB, Shen XP, Christopher MB. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757-1771.
[11] Amanda C, Ross DK. Knowledge discovery in multi-label phenotype data. In: Proc. of the European Conf. on Principles of Data Mining and Knowledge Discovery. Springer, 2001. 42-53.
[12] Jesse R, Bernhard P, Geffrey H. Multi-label classification using ensembles of pruned sets. In: Proc. of the 8th IEEE Int’l Conf. on Data Mining. IEEE, 2008. 995-1000.
[13] Andre E, Jason W. A kernel method for multi-labelled classification. In: Proc. of the Advances in Neural Information Processing Systems. 2002. 681-687.
[14] Qi GJ, Hua XS, Rui Y, et al. Correlative multi-label video annotation. In: Proc. of the 15th ACM Int’l Conf. on Multimedia. 2007. 17-26.
[15] Ji SW, Tang L, Yu SP, et al. Extracting shared subspace for multi-label classification. In: Proc. of the 14th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. 2008. 381-389.
[16] Xu M, Guo LZ. Learning from group supervision: The impact of supervision deficiency on multi-label learning. Science China Information Sciences, 2021, 64(3): 1-13 (in Chinese with English abstract).
[17] Zhang ML, Zhou ZH. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. on Knowledge and Data Engineering, 2006, 18(10): 1338-1351.
[18] Zhang ML, Zhou ZH. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048.
[19] Cheng WW, Eyke H, Krzysztof JD. Bayes optimal multilabel classication via probabilistic classifier chains. In: Proc. of the 27th Int’l Conf. on Machine Learning (ICML 2010). 2010. 279-286.
[20] Wang SF, Wang J, Wang ZY, et al. Enhancing multi-label classification by modeling dependencies among labels. Pattern Recognition, 2014, 47(10): 3405-3413.
[21] Zhang ML, Zhang K. Multi-label learning by exploiting label dependency. In: Proc. of the 16th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. 2010. 999-1008.
[22] Xu C, Tao DC, Xu C. Large margin multi-label causal feature learning. In: Proc. of the AAAI Conf. on Artificial Intelligence. 2015. 1924-1930.
[23] Chang CC, Lin CJ. Libsvm: A library for support vector machines. ACM Trans. on Intelligent Systems and Technology (TIST), 2011, 2(3): Article 27.