自适应主动半监督学习方法
作者:
作者单位:

作者简介:

李延超(1990-),男,博士,讲师,主要研究领域为人工智能,大数据管理.
肖甫(1980-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为传感网,物联网.
陈志(1978-),男,博士,教授,CCF专业会员,主要研究领域为软件工程,无线传感网,物联网,数据挖掘.
李博(1979-),男,高级工程师,主要研究领域为自然语言处理,知识图谱.

通讯作者:

李延超,E-mail:yanchao@njupt.edu.cn

中图分类号:

基金项目:

国家自然科学基金(61932013);江苏省自然科学基金(BK20200739);江苏省333高层次人才培养工程(BRA2020065)


Adaptive Active Learning for Semi-supervised Learning
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61932013); Natural Science Foundation of Jiangsu Province of China (BK20200739); Research Foundation of Jiangsu for 333 High Level Talents Training Project (BRA2020065)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受3个限制:(1)一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未标记样本;(2)现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数或差异性衡量;(3)噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.

    Abstract:

    Active learning algorithms attempt to overcome the labeling bottleneck by asking queries from a large collection of unlabeled examples. Existing batch mode active learning algorithms suffer from three limitations: (1) the models with assumption on data are hard in finding images that are both informative and representative; (2) the methods that are based on similarity function or optimizing certain diversity measurement may lead to suboptimal performance and produce the selected set with redundant examples; (3) the problem of noise labels has been an obstacle for active learning algorithms. This study proposes a novel batch mode active learning method based on deep learning. The deep neural network generates the representations (embeddings) of labeled and unlabeled examples, and label cycle mode is adopted by connecting the embeddings from labeled examples to those of unlabeled examples and back at the same class, which considers both informativeness and representativeness of examples, as well as being robust to noisy labels. The proposed active learning method is applied to semi-supervised classification and clustering. The submodular function is designed to reduce the redundancy of the selected examples. Moreover, the query criteria of weighting losses are optimized in active learning, which automatically trade off the balance of informative and representative examples. Specifically, batch mode active scheme is incorporated into the classification approaches, in which the generalization ability is improved. For semi-supervised clustering, the proposed active scheme for constraints is used to facilitate fast convergence and perform better than unsupervised clustering. To validate the effectiveness of the proposed algorithms, extensive experiments are conducted on diversity benchmark datasets for different tasks, and the experimental results demonstrate consistent and substantial improvements over the state-of-the-art approaches.

    参考文献
    相似文献
    引证文献
引用本文

李延超,肖甫,陈志,李博.自适应主动半监督学习方法.软件学报,2020,31(12):3808-3822

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-07-07
  • 最后修改日期:2019-07-28
  • 录用日期:
  • 在线发布日期: 2020-12-03
  • 出版日期: 2020-12-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号