Survey on Vision-language Pre-training

doi:10.13328/j.cnki.jos.006774

微信服务号

微信订阅号

Home > Archive>Volume 34, Issue 5, 2023 >2000-2023. DOI:10.13328/j.cnki.jos.006774

PDF HTML XML Export Cite reminder

Survey on Vision-language Pre-training
DOI:
                        10.13328/j.cnki.jos.006774
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, deep learning has achieved excellent performance in unimodal areas such as computer vision (CV) and natural language processing (NLP). With the development of technology, the importance and necessity of multimodal learning begin to unfold. Essential to multimodal learning, vision-language learning has received extensive attention from researchers in and outside China. Thanks to the development of the Transformer framework, more and more pre-trained models are applied to vision-language multimodal learning, and the performance of related tasks is improved qualitatively. This study systematically reviews the current work on vision-language pre-trained models. Firstly, the knowledge about pre-trained models is introduced. Secondly, the structure of pre-trained models is analyzed and compared from two perspectives. The commonly used vision-language pre-training techniques are discussed, and five downstream pre-training tasks are elaborated. Finally, the common datasets used in image and video pre-training tasks are expounded, and the performance of commonly used pre-trained models on different datasets under different tasks is compared and analyzed.

Reference

Cited by

Get Citation

殷炯,张哲东,高宇涵,杨智文,李亮,肖芒,孙垚棋,颜成钢.视觉语言预训练综述.软件学报,2023,34(5):2000-2023

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 18,2022
Revised:May 29,2022
Adopted:
Online: September 20,2022
Published: May 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History