Visual Scene Description and Its Performance Evaluation

doi:10.13328/j.cnki.jos.005665

微信服务号

微信订阅号

2025-5-13- 5

Home > Archive>Volume 30, Issue 4, 2019 >867-883. DOI:10.13328/j.cnki.jos.005665

PDF HTML XML Export Cite reminder

Visual Scene Description and Its Performance Evaluation
DOI:
                        10.13328/j.cnki.jos.005665
                    
Author:
                        MA MiaoMA Miao
Key Laboratory of Modern Teaching Technology of Ministry of Education(Shaanxi Normal University), Xian 710062, China;School of Computer Science, Shaanxi Normal University, Xian 710119, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Bo-LongWANG Bo-Long
School of Computer Science, Shaanxi Normal University, Xian 710119, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU QiWU Qi
School of Computer Science, The University of Adelaide, Adelaide SA5005, Australia
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU JieWU Jie
School of Computer Science, Shaanxi Normal University, Xian 710119, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GUO MinGUO Min
School of Computer Science, Shaanxi Normal University, Xian 710119, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61877038, 61801282, 61601274); Natural Science Foundation of Shaanxi Province, China (2018JM6068); Fundamental Research Funds for the Central Universities of Shaanxi Normal University (GK201703054, GK201703058)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

As a cross-domain research topic related to Computer Vision, Multimedia, Artificial Intelligence and Natural Language Processing, the task of visual scene description is to produce automatically one or more sentences to describe the content of visual scene from an image or a video snippet. The richness of the content in the visual scene and the diversity of the expression of natural language make visual scene description a challenging task. This paper gives a review about the generation methods and performance evaluation on the recently developed visual scene description methods. Specifically, the research object and main tasks of visual scene description are firstly defined; the relationships between visual scene description and multi-modal retrieval, cross-modal learning, scene classification, visual relationship detection and other related technologies are discussed sequentially. And then, main methods and research progress of visual scene description are summarized in three categories, while the increasing benchmark datasets are discussed. Besides, some widely-used evaluation metrics and the corresponding challenges on the visual scene description are discussed. Finally, some potential applications in future are suggested.

Key words:deep learning;image captioning;video captioning;benchmark dataset;performance evaluation

Get Citation

马苗,王伯龙,吴琦,武杰,郭敏.视觉场景描述及其效果评价.软件学报,2019,30(4):867-883

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 15,2018
Revised:June 13,2018
Adopted:
Online: April 01,2019
Published:

You are the first2044102Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History