多模态视觉语言表征学习研究综述

doi:10.13328/j.cnki.jos.006125

微信服务号

微信订阅号

首页 > 过刊浏览>2021年第32卷第2期 >327-348. DOI:10.13328/j.cnki.jos.006125

PDF HTML阅读 XML下载导出引用引用提醒

多模态视觉语言表征学习研究综述
DOI:
                        10.13328/j.cnki.jos.006125
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:杜鹏飞(1985-),男,博士生,主要研究领域为人工智能,情感计算,网络安全.
李小勇(1975-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为网络安全,可信服务工程.
高雅丽(1991-),女,博士,CCF专业会员,主要研究领域为网络安全,可信服务工程.
通讯作者:李小勇,E-mail:lxyxjtu@163.com
中图分类号:
基金项目:国家自然科学基金（U1836215）

Survey on Multimodal Visual Language Representation Learning

Author:

Affiliation:

Fund Project:

National Natural Science Foundation of China (U1836215)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

我们生活在一个由大量不同模态内容构建而成的多媒体世界中，不同模态信息之间具有高度的相关性和互补性，多模态表征学习的主要目的就是挖掘出不同模态之间的共性和特性，产生出可以表示多模态信息的隐含向量.主要介绍了目前应用较广的视觉语言表征的相应研究工作，包括传统的基于相似性模型的研究方法和目前主流的基于语言模型的预训练的方法.目前比较好的思路和解决方案是将视觉特征语义化，然后与文本特征通过一个强大的特征抽取器产生出表征，其中，Transformer作为主要的特征抽取器被应用表征学习的各类任务中.分别从研究背景、不同研究方法的划分、测评方法、未来发展趋势等几个不同角度进行阐述.

Abstract:

A multimedia world in which human beings live is built from a large number of different modal contents. The information between different modalities is highly correlated and complementary. The main purpose of multi-modal representation learning is to mine the different modalities. Commonness and characteristics produce implicit vectors that can represent multimodal information. This article mainly introduces the corresponding research work of the currently widely used visual language representation, including traditional research methods based on similarity models and current mainstream pre-training methods based on language models. The current better ideas and solutions are to semanticize visual features and then generate representations with textual features through a powerful feature extractor. Transformer is currently used in various tasks of representation learning as the mainstream network architecture. This article elaborates from several different angles of research background, division of different studies, evaluation methods, future development trends, etc.

参考文献

相似文献

引证文献

引用本文

杜鹏飞,李小勇,高雅丽.多模态视觉语言表征学习研究综述.软件学报,2021,32(2):327-348

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-05-11
最后修改日期:2020-06-26
录用日期:
在线发布日期: 2020-09-10
出版日期: 2021-02-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码