Image Captioning Based on Visual Relevance and Context Dual Attention

doi:10.13328/j.cnki.jos.006623

微信服务号

微信订阅号

2025-5-13- 1

Home > Archive>Volume 33, Issue 9, 2022 >3210-3222. DOI:10.13328/j.cnki.jos.006623

PDF HTML XML Export Cite reminder

Image Captioning Based on Visual Relevance and Context Dual Attention
DOI:
                        10.13328/j.cnki.jos.006623
                    
Author:
                        LIU Mao-FuLIU Mao-Fu
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI QiSHI Qi
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
NIE Li-QiangNIE Li-Qiang
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP391
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Image captioning is of great theoretical significance and application value, which has attracted wide attention in computer vision and natural language processing. The existing attention mechanism-based image captioning methods integrate the current word and visual cues at the same moment to generate the target word, but they neglect the visual relevance and contextual information, which results in a difference between the generated caption and the ground truth. To address this problem, this paper presents the visual relevance and context dual attention (VRCDA) method. The visual relevance attention incorporates the attention vector of the previous moment into the traditional visual attention to ensure visual relevance, and the context attention is used to obtain much complete semantic information from the global context for better use of the context. In this way, the final image caption is generated via visual relevance and context information. The experiments on the MSCOCO and Flickr30k benchmark datasets demonstrate that VRCDA can effectively describe the image semantics, and compared with several state-of-the-art methods of image captioning, VRCDA can yield superior performance in all evaluation metrics.

Key words:image captioning;dual attention mechanism;visual relevance attention;context attention

Get Citation

刘茂福,施琦,聂礼强.基于视觉关联与上下文双注意力的图像描述生成方法.软件学报,2022,33(9):3210-3222

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 03,2021
Revised:August 15,2021
Adopted:
Online: February 22,2022
Published: September 06,2022

You are the first2044069Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History