国家自然科学基金(62076124, 62106102); 中央高校基本科研业务费(NJ2020023)
借助预置任务创建的免费监督信号/标记, 自监督学习(SSL)能学得无标记数据的有效表示, 并已在多种下游任务中获得了验证. 现有预置任务通常先对原视图数据作显式的线性或非线性变换, 由此形成了多个增广视图数据, 然后通过预测上述视图或变换的对应标记或最大化视图间的一致性达成学习表示. 发现这种自监督增广(即数据自身与自监督标记的增广)不仅有益无监督预置任务而且也有益监督分类任务的学习, 而当前鲜有工作对此关注, 它们要么将预置任务作为下游分类任务的学习辅助, 采用多任务学习建模; 要么采用多标记学习, 联合建模下游任务标记与自监督标记. 然而, 下游任务与预置任务间往往存在固有差异(语义, 任务难度等), 由此不可避免地造成二者学习间的竞争, 给下游任务的学习带来风险. 为挑战该问题, 提出一种简单但有效的自监督多视图学习框架(SSL-MV), 通过在增广数据视图上执行与下游任务相同的学习来避免自监督标记对下游标记学习的干扰. 更有意思的是, 借助多视图学习, 设计的框架自然拥有了集成推断能力, 因而显著提升了下游分类任务的学习性能. 最后, 基于基准数据集的广泛实验验证了SSL-MV的有效性.
With the free supervised signals/labels created by pretext tasks, self-supervised learning (SSL) can learn effective representation from unlabeled data, which has been verified in various downstream tasks. Existing pretext tasks usually first perform explicit linear or nonlinear transformations on the original view data, thus forming multiple augmented view data, then learn the representation by predicting the corresponding transformations or maximizing the consistency among the above views. It is found that such self-supervised augmentations (i.e., the augmentations of the data itself and self-supervised labels) benefit the learning of not only the unsupervised pretext tasks but also the supervised classification task. Nevertheless, few work focus on this at present, while existing works either take the pretexts as the auxiliary of downstream classification task and adopt the multi-task learning or jointly model the downstream task labels and self-supervised labels in a multi-label learning way. Actually, there are inherent differences between downstream and pretext tasks (e.g., semantic, task difficulty, etc.), which inevitably result in the competitions between them and bring risks to the learning of downstream tasks. To challenge this issue, this study proposes a simple yet effective SSL multi-view learning framework (SSL-MV), which avoids the learning interference of self-supervised labels on downstream labels through performing the same learning as downstream tasks on the augmented data views. More interestingly, with the multi-view learning, the proposed framework naturally owns the integration inference ability, which significantly improves the performance of downstream supervised classification tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of SSL-MV.