融合扩增技术的无监督域适应方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金(61976067, 62271036)


Unsupervised Domain Adaptation Method with Augmentation Technology
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    域适应(domain adaptation, DA)是一类训练集(源域)和测试集(目标域)数据分布不一致条件下的机器学习任务. 其核心在于如何克服数据域的分布差异对分类器泛化能力的负面影响, 即设计合理而有效的训练策略, 通过最小化数据域之间的差异, 获得高泛化能力的分类模型. 研究了源域中包含标注信息, 目标域中缺少标注信息条件下的无监督域适应(unsupervised domain adaptation, UDA)任务. 将其形式化为如何利用部分标注样本和其余未标注样本进行分类器训练的半监督学习问题, 进而引入伪标签(pseudo label, PL)和一致性正则化(consistent regularization, CR)这两种半监督学习技术, 对所观测数据域有目的进行标记和样本扩增, 使用扩增后的训练样本学习分类器, 从而, 在无监督域适应任务上取得了良好的泛化能力. 提出一种融合扩增技术的无监督域适应(augmentation-based unsupervised domain adaptation, A-UDA)方法, 在分类器的训练过程中: 首先, 使用随机数据增强技术(random augmentation)对目标域中的未标注样本进行扩增, 即样本扩增; 其次, 利用模型的预测输出结果, 对高置信度的未标注样本添加伪标记, 即标注扩增; 最后, 使用扩增后的数据集训练分类模型, 利用最大均值差异(maximum mean difference, MMD)计算源域和目标域的分布距离, 通过最小化该分布距离获得具有高泛化能力的分类器. 在MNIST-USPS, Office-Home和ImageCLEF-DA等多个无监督域适应任务上对所提出方法进行比较, 与现有其他工作相比, 获得了更好的分类效果.

    Abstract:

    Domain adaptation (DA) is a group of machine learning tasks where the training set (source domain) and the test set (target domain) exhibit different distributions. Its key idea lies in how to overcome the negative impact given by these distributional differences, in other words, how to design an effective training strategy to obtain a classifier with high generalization performance by minimizing the difference between data domains. This study focuses on the tasks of unsupervised DA (UDA), where annotations are available in the source domain but absent in the target domain. This problem can be considered as how to use partially annotated data and unannotated data to train a classifier in a semi-supervised learning framework. Then, two kinds of semi-supervised learning techniques, namely pseudo labels (PLs) and consistent regularization (CR), are used to augment and annotate data in the observed domain for learning the classifier. Consequently, the classifier can obtain better generalization performance in the tasks of UDA. This study proposes augmentation-based UDA (A-UDA), in which the unannotated data in the target domain are augmented by random augmentation, and the high-confident data are annotated by adding pseudo-labels based on the predicted output of the model. The classifier is trained on the augmented data set. The distribution distance between the source domain and the target domain is calculated by using the maximum mean difference (MMD). By minimizing this distance, the classifier achieves high generalization performance. The proposed method is evaluated on multiple UDA tasks, including MNIST-USPS, Office-Home, and ImageCLEF-DA. Compared to other existing methods, it achieves better performance on these tasks.

    参考文献
    相似文献
    引证文献
引用本文

曹艺,郭茂祖,吴伟宁.融合扩增技术的无监督域适应方法.软件学报,,():1-18

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-04-20
  • 最后修改日期:2023-09-01
  • 录用日期:
  • 在线发布日期: 2024-08-28
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号