融合扩增技术的无监督域适应方法

doi:10.13328/j.cnki.jos.007233

微信服务号

微信订阅号

2025年4月3日 10:04 星期四

首页 > 过刊浏览>年第卷第期 >1-18. DOI:10.13328/j.cnki.jos.007233

PDF HTML阅读 XML下载导出引用引用提醒

融合扩增技术的无监督域适应方法
DOI:
                        10.13328/j.cnki.jos.007233
                    
CSTR:
                        
                    
作者:
                        曹艺曹艺
哈尔滨工程大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001
在期刊界中查找
在百度中查找
在本站中查找
郭茂祖郭茂祖
北京建筑大学 电气与信息工程学院, 北京 102616
在期刊界中查找
在百度中查找
在本站中查找
吴伟宁吴伟宁
哈尔滨工程大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:国家自然科学基金(61976067, 62271036)

Unsupervised Domain Adaptation Method with Augmentation Technology

Author:

CAO Yi
CAO Yi
College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
在期刊界中查找
在百度中查找
在本站中查找
GUO Mao-Zu
GUO Mao-Zu
School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
在期刊界中查找
在百度中查找
在本站中查找
WU Wei-Ning
WU Wei-Ning
College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

域适应(domain adaptation, DA)是一类训练集(源域)和测试集(目标域)数据分布不一致条件下的机器学习任务. 其核心在于如何克服数据域的分布差异对分类器泛化能力的负面影响, 即设计合理而有效的训练策略, 通过最小化数据域之间的差异, 获得高泛化能力的分类模型. 研究了源域中包含标注信息, 目标域中缺少标注信息条件下的无监督域适应(unsupervised domain adaptation, UDA)任务. 将其形式化为如何利用部分标注样本和其余未标注样本进行分类器训练的半监督学习问题, 进而引入伪标签(pseudo label, PL)和一致性正则化(consistent regularization, CR)这两种半监督学习技术, 对所观测数据域有目的进行标记和样本扩增, 使用扩增后的训练样本学习分类器, 从而, 在无监督域适应任务上取得了良好的泛化能力. 提出一种融合扩增技术的无监督域适应(augmentation-based unsupervised domain adaptation, A-UDA)方法, 在分类器的训练过程中: 首先, 使用随机数据增强技术(random augmentation)对目标域中的未标注样本进行扩增, 即样本扩增; 其次, 利用模型的预测输出结果, 对高置信度的未标注样本添加伪标记, 即标注扩增; 最后, 使用扩增后的数据集训练分类模型, 利用最大均值差异(maximum mean difference, MMD)计算源域和目标域的分布距离, 通过最小化该分布距离获得具有高泛化能力的分类器. 在MNIST-USPS, Office-Home和ImageCLEF-DA等多个无监督域适应任务上对所提出方法进行比较, 与现有其他工作相比, 获得了更好的分类效果.

关键词:无监督域适应;半监督学习;数据扩增;伪标签;一致性正则化

Abstract:

Domain adaptation (DA) is a group of machine learning tasks where the training set (source domain) and the test set (target domain) exhibit different distributions. Its key idea lies in how to overcome the negative impact given by these distributional differences, in other words, how to design an effective training strategy to obtain a classifier with high generalization performance by minimizing the difference between data domains. This study focuses on the tasks of unsupervised DA (UDA), where annotations are available in the source domain but absent in the target domain. This problem can be considered as how to use partially annotated data and unannotated data to train a classifier in a semi-supervised learning framework. Then, two kinds of semi-supervised learning techniques, namely pseudo labels (PLs) and consistent regularization (CR), are used to augment and annotate data in the observed domain for learning the classifier. Consequently, the classifier can obtain better generalization performance in the tasks of UDA. This study proposes augmentation-based UDA (A-UDA), in which the unannotated data in the target domain are augmented by random augmentation, and the high-confident data are annotated by adding pseudo-labels based on the predicted output of the model. The classifier is trained on the augmented data set. The distribution distance between the source domain and the target domain is calculated by using the maximum mean difference (MMD). By minimizing this distance, the classifier achieves high generalization performance. The proposed method is evaluated on multiple UDA tasks, including MNIST-USPS, Office-Home, and ImageCLEF-DA. Compared to other existing methods, it achieves better performance on these tasks.

Key words:unsupervised domain adaptation (UDA);semi-supervised learning (SSL);data augmentation;pseudo label (PL);consistent regularization (CR)

引用本文

曹艺,郭茂祖,吴伟宁.融合扩增技术的无监督域适应方法.软件学报,,():1-18

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-04-20
最后修改日期:2023-09-01
录用日期:
在线发布日期: 2024-08-28
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码