[关键词]
[摘要]
传统监督学习需要训练样本的真实标记信息,而在很多情况下,真实标记并不容易收集.与之对比,众包学习从多个可能犯错的非专家收集标注,通过某种融合方式估计样本的真实标记.注意到现有深度众包学习工作对标注者相关性建模不足,而非深度众包学习方面的工作表明,标注者相关性建模利用有助于改善学习效果.提出一种深度生成式众包学习方法,以结合深度神经网络优势及利用标注者相关性.该模型由深度神经网络分类器先验和标注生成过程组成,其中,标注生成过程通过引入各类别内标注者能力的混合模型以建模标注者相关性.为自适应地匹配数据及模型复杂度,实现了完全贝叶斯推断.基于结构变分自编码器的自然梯度随机变分推断技术,将共轭参数变分消息传递与神经网络参数随机梯度下降结合到统一框架,实现端到端的高效优化.在22个真实众包数据集上的实验结果验证了该方法的有效性.
[Key word]
[Abstract]
Traditional supervised learning requires the ground truth labels for the training data, which can be difficult to collect in many cases. In contrast, crowdsourcing learning collects noisy annotations from multiple non-expert workers and infers the latent true labels through some aggregation approach. This study notices that existing deep crowdsourcing work do not sufficiently model worker correlations, which however is shown to be helpful for learning by previous non-deep learning approaches. A deep generative crowdsourcing learning model is proposed to combine the strength of deep neural networks (DNN) and at the same time exploit the worker correlations. The model comprises a DNN classifier as a priori for the true labels, and one annotation generation process in which a mixture model of workers’ reliabilities within each class is introduced for inter-worker correlation. To automatically trade-off between the model complexity and data fitting, fully Bayesian inference is developed. Based on the natural-gradient stochastic variational inference techniques developed for structured variational autoencoder (SVAE), variational message passing is combined for conjugate parameters and stochastic gradient descent for DNN under a unified framework to conduct efficient end-to-end optimization. Experimental results on 22 real world crowdsourcing data sets demonstrate the effectiveness of the proposed approach.
[中图分类号]
[基金项目]
国家自然科学基金(61906089);江苏省基础研究计划(BK20190408);中国博士后基金(2019TQ0152)