基于视觉关联与上下文双注意力的图像描述生成方法

doi:10.13328/j.cnki.jos.006623

微信服务号

微信订阅号

首页 > 过刊浏览>2022年第33卷第9期 >3210-3222. DOI:10.13328/j.cnki.jos.006623

PDF HTML阅读 XML下载导出引用引用提醒

基于视觉关联与上下文双注意力的图像描述生成方法
DOI:
                        10.13328/j.cnki.jos.006623
                    
作者:
                        
                        
                    
作者单位:
作者简介:刘茂福(1977－), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为自然语言处理;施琦(1997－), 男, 硕士生, 主要研究领域为自然语言处理;聂礼强(1985－), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为多媒体内容分析与搜索
通讯作者:聂礼强, E-mail: nieliqiang@gmail.com
中图分类号:TP391
基金项目:

Image Captioning Based on Visual Relevance and Context Dual Attention

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

图像描述生成有着重要的理论意义与应用价值, 在计算机视觉与自然语言处理领域皆受到广泛关注. 基于注意力机制的图像描述生成方法, 在同一时刻融合当前词和视觉信息以生成目标词, 忽略了视觉连贯性及上下文信息, 导致生成描述与参考描述存在差异. 针对这一问题, 提出一种基于视觉关联与上下文双注意力机制的图像描述生成方法(visual relevance and context dual attention, VRCDA). 视觉关联注意力在传统视觉注意力中增加前一时刻注意力向量以保证视觉连贯性, 上下文注意力从全局上下文中获取更完整的语义信息, 以充分利用上下文信息, 进而指导生成最终的图像描述文本. 在MSCOCO和Flickr30k两个标准数据集上进行了实验验证, 结果表明所提出的VRCDA方法能够有效地生成图像语义描述, 相比于主流的图像描述生成方法, 在各项评价指标上均取得了较高的提升.

Abstract:

Image captioning is of great theoretical significance and application value, which has attracted wide attention in computer vision and natural language processing. The existing attention mechanism-based image captioning methods integrate the current word and visual cues at the same moment to generate the target word, but they neglect the visual relevance and contextual information, which results in a difference between the generated caption and the ground truth. To address this problem, this paper presents the visual relevance and context dual attention (VRCDA) method. The visual relevance attention incorporates the attention vector of the previous moment into the traditional visual attention to ensure visual relevance, and the context attention is used to obtain much complete semantic information from the global context for better use of the context. In this way, the final image caption is generated via visual relevance and context information. The experiments on the MSCOCO and Flickr30k benchmark datasets demonstrate that VRCDA can effectively describe the image semantics, and compared with several state-of-the-art methods of image captioning, VRCDA can yield superior performance in all evaluation metrics.

参考文献

相似文献

引证文献

引用本文

刘茂福,施琦,聂礼强.基于视觉关联与上下文双注意力的图像描述生成方法.软件学报,2022,33(9):3210-3222

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-06-03
最后修改日期:2021-08-15
录用日期:
在线发布日期: 2022-02-22
出版日期: 2022-09-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码