多模态可信度感知的情感计算
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金(62006166, 62076175, 62076176); 江苏高校优势学科建设工程


Multi-modal Reliability-aware Affective Computing
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多模态情感计算是情感计算领域一个基础且重要的研究任务, 旨在利用多模态信号对用户生成的视频进行情感理解. 尽管已有的多模态情感计算方法在基准数据集上取得了不错的性能, 但这些方法无论是设计复杂的融合策略还是学习模态表示, 普遍忽视了多模态情感计算任务中存在的模态可信度偏差问题. 认为相较于文本, 语音和视觉模态往往能更真实的表达情感, 因而在情感计算任务中, 语音和视觉是高可信度的, 文本是低可信度的. 然而, 已有的针对不同模态特征抽取工具的学习能力不同, 导致文本模态表示能力往往强于语音和视觉模态(例如: GPT3与ResNet), 这进一步加重了模态可信度偏差问题, 不利于高精度的情感判断. 为缓解模态可信度偏差, 提出一种模型无关的基于累积学习的多模态可信度感知的情感计算方法, 通过为低可信度的文本模态设计单独的文本模态分支捕捉偏差, 让模型在学习过程中从关注于低可信度文本模态的情感逐步关注到高可信度语音和视觉模态的情感, 从而有效缓解低可信度文本模态导致的情感预测不准确. 在多个基准数据集上进行实验, 多组对比实验的结果表明, 所提出的方法能够有效地突出高可信度语音和视觉模态的重要性, 缓解低可信度文本模态的偏差; 并且, 该模型无关的方法显著提升了多模态情感计算方法的性能, 这表明所提方法在多模态情感计算任务中的有效性和通用性.

    Abstract:

    Multi-modal affective computing is a fundamental and important research task in the field of affective computing, using multi-modal signals to understand the sentiment of user-generated video. Although existing multi-modal affective computing approaches have achieved good performance on benchmark datasets, they generally ignore the problem of modal reliability bias in multi-modal affective computing tasks, whether in designing complex fusion strategies or learning modal representations. This study believes that compared to text, acoustic and visual modalities often express sentiment more realistically. Therefore, voice and vision have high reliability, while text has low reliability in affective computing tasks. However, existing learning abilities of different modality feature extraction tools are different, resulting in a stronger ability to represent textual modality than acoustic and visual modalities (e.g., GPT3 and ResNet). This further exacerbates the problem of modal reliability bias, which is unfavorable for high-precision sentiment judgment. To mitigate the bias caused by modal reliability, this study proposes a model-agnostic multi-modal reliability-aware affective computing approach (MRA) based on cumulative learning. MRA captures the modal reliability bias by designing a single textual-modality branch and gradually shifting the focus from sentiments expressed in low-reliability textual modality to high-reliability acoustic and visual modalities during the model learning process. Thus, MRA effectively alleviates inaccurate sentiment predictions caused by low-reliability textual modality. Multiple comparative experiments conducted on multiple benchmark datasets demonstrate that the proposed approach MRA can effectively highlight the importance of high-reliability acoustic and visual modalities and mitigate the bias of low-reliability textual modality. Additionally, the model-agnostic approach significantly improves the performance of multi-modal affective computing, indicating its effectiveness and generality in multi-modal affective computing tasks.

    参考文献
    相似文献
    引证文献
引用本文

罗佳敏,王晶晶,周国栋.多模态可信度感知的情感计算.软件学报,,():1-17

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-04-03
  • 最后修改日期:2023-07-06
  • 录用日期:
  • 在线发布日期: 2024-05-08
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号