加权解耦语义表达的多源领域自适应方法
作者:
作者单位:

作者简介:

蔡瑞初(1983-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为因果发现,图神经网络,领域自适应学习,自然语言处理;李梓健(1994-),男,博士生,主要研究领域为迁移学习;郑丽娟(1996-),女,硕士,主要研究领域为领域自适应学习,多源领域迁移学习.

通讯作者:

蔡瑞初,E-mail:cairuichu@gmail.com

中图分类号:

TP181

基金项目:

国家自然科学基金(61876043,61976052);广州市科技计划(201902010058)


Multi-source Domain Adaptation of Weighted Disentangled Semantic Representation
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来,深度学习受到越来越多研究者的重视并成功应用于许多领域.虽然深度学习在这些领域获得了巨大的成功,但是数据采集和标注成本高,严重限制了深度学习的推广应用.迁移学习不仅可以打破训练集数据和测试集数据独立同分布的假设,而且可以利用有标签的迁移源数据和没有标签的迁移目标数据训练得到具有良好泛化能力的模型,是扩展深度学习应用场景的重要研究方向.在众多的迁移学习方法中,多源领域自适应方法可以充分利用多个迁移源的信息,具有重要的实际价值.从数据的因果生成机制出发,假设观测数据由语义隐变量和领域隐变量这两组独立的隐变量同时生成.基于上述假设,提出了一种基于多种距离度量框架和加权解耦语义表达的多源领域自适应方法.该方法利用了双重对抗网络来提取解耦的语义信息和领域信息;另一方面,采用了3种不同的语义信息聚合策略获得领域不变的语义表达;最后使用领域不变的语义表达进行图片分类.在多个多源领域自适应数据上的对比及鲁棒性分析实验中,充分地验证了所提出方法的有效性.

    Abstract:

    Recent years have witnessed the widespread use of domain adaptation. Thought having achieved significant performance in different fields, these methods are hungry for a large amount of labeled data, which requires unaffordable cost to meet the data quality and quantity and hinders the further application of deep learning model. Fortunately, domain adaptation, which not only relaxes the I.I.D assumption between the source and the target domain but also uses the labeled source domain data and the unlabeled target domain data simultaneously, is beneficial to achieve a well-generalized model. Among all the domain adaptation setting, multi-source domain adaptation, which takes full advantage of the information of multiple source domains, are more suitable to the real-world application. This study proposes a multi-source domain adaptation method via multi-measure framework and weighted disentangled semantic representation. Motivated from the data generation process in causal view, it is first assumed that the observed samples are controlled by the semantic latent variables and the domain latent variables, and it is further assumed that these variables are independent. As for the extraction of these variables, the duel adversarial training schema is used to extract and disentangle the semantic latent variables and the domain latent variables. As for the multi-domain aggregation, three different domain aggregation strategies are employed to obtain the weighted domain-invariant semantic representation. Finally, the weighted domain-invariant semantic representation is used for classification. Experiment studies not only testify that the proposed method yields state-of-the-art performance on many multi-source domain adaptation benchmark datasets but also validate the robust of the proposed method.

    参考文献
    相似文献
    引证文献
引用本文

蔡瑞初,郑丽娟,李梓健.加权解耦语义表达的多源领域自适应方法.软件学报,2022,33(12):4517-4533

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-11-24
  • 最后修改日期:2021-03-16
  • 录用日期:
  • 在线发布日期: 2022-12-03
  • 出版日期: 2022-12-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号