Survey on Multimodal Visual Language Representation Learning
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (U1836215)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    A multimedia world in which human beings live is built from a large number of different modal contents. The information between different modalities is highly correlated and complementary. The main purpose of multi-modal representation learning is to mine the different modalities. Commonness and characteristics produce implicit vectors that can represent multimodal information. This article mainly introduces the corresponding research work of the currently widely used visual language representation, including traditional research methods based on similarity models and current mainstream pre-training methods based on language models. The current better ideas and solutions are to semanticize visual features and then generate representations with textual features through a powerful feature extractor. Transformer is currently used in various tasks of representation learning as the mainstream network architecture. This article elaborates from several different angles of research background, division of different studies, evaluation methods, future development trends, etc.

    Reference
    Related
    Cited by
Get Citation

杜鹏飞,李小勇,高雅丽.多模态视觉语言表征学习研究综述.软件学报,2021,32(2):327-348

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 11,2020
  • Revised:June 26,2020
  • Adopted:
  • Online: September 10,2020
  • Published: February 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063