结构化学习的噪声可学习性分析及其应用
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61173073); 国家高技术研究发展计划(863)(2011AA01A207)


Theoretical Analysis on Structured Learning with Noisy Data and its Applications
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    噪声可学习性理论指出,有监督学习方法的性能会受到训练样本标记噪声的严重影响.然而,已有相关理论研究仅针对二类分类问题.致力于探究结构化学习问题受噪声影响的规律性.首先,注意到在结构化学习问题中,标注数据的噪声会在训练过程中被放大,使得训练过程中标记样本的噪声率高于标记样本的错误率.传统的噪声可学习性理论并未考虑结构化学习中的这一现象,从而低估了问题的复杂性.从结构化学习问题的噪声放大现象出发,提出了新的结构化学习问题的噪声可学习性理论.在此基础上,提出了有效训练数据规模的概念,这一指标可用于在实践中描述噪声学习问题的数据质量,并进一步分析了实际应用中的结构化学习模型在高噪声环境下向低阶模型回退的情况.实验结果证明了该理论的正确性及其在跨语言映射和协同训练方法中的应用价值和指导意义.

    Abstract:

    Performance of supervised machine learning can be badly affected by noises of labeled data, as indicated by existing well studied theories on learning with noisy data. However these theories only focus on two-class classification problems. This paper studies the relation between noise examples and their effects on structured learning. Firstly, the paper founds that noise of labeled data increases in structured learning problems, leading to a higher noise rate in training procedure than on labeled data. Existing theories do not consider noise increament in structured learning, thus underestimate the complexities of learning problems. This paper provides a new theory on learning from noise data with structured predictions. Based on the theory, the concept of "effective size of training data" is proposed to describe the qualities of noisy training data sets in practice. The paper also analyzes the situations when structured learning models will go back to lower order ones in applications. Experimental results are given to confirm the correctness of these theories as well as their practical values on cross-lingual projection and co-training.

    参考文献
    相似文献
    引证文献
引用本文

于墨,赵铁军,胡鹏龙,郑德权.结构化学习的噪声可学习性分析及其应用.软件学报,2013,24(10):2340-2353

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2012-06-11
  • 最后修改日期:2013-02-04
  • 录用日期:
  • 在线发布日期: 2013-10-12
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号