基于自编码器生成对抗网络的可配置文本图像编辑

doi:10.13328/j.cnki.jos.006622

微信服务号

微信订阅号

2025年4月4日 12:30 星期五

首页 > 过刊浏览>2022年第33卷第9期 >3139-3151. DOI:10.13328/j.cnki.jos.006622

PDF HTML阅读 XML下载导出引用引用提醒

基于自编码器生成对抗网络的可配置文本图像编辑
DOI:
                        10.13328/j.cnki.jos.006622
                    
CSTR:
                        
                    
作者:
                        吴福祥吴福祥
中国科学院 深圳先进技术研究院 广东省机器人与智能系统重点实验室, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找
程俊程俊
中国科学院 深圳先进技术研究院 广东省机器人与智能系统重点实验室, 广东 深圳 518055
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:吴福祥(1984－), 男, 博士, 助理研究员, CCF专业会员, 主要研究领域为多模态深度学习, 文本图像合成, 自然语言处理;程俊(1977－), 男, 博士, 研究员, 博士生导师, 主要研究领域为机器视觉, 机器人, 机器智能和控制
通讯作者:程俊, E-mail: jun.cheng@siat.ac.cn
中图分类号:TP391
基金项目:国家自然科学基金(U21A20487); 深圳市基础研究项目(JCYJ20200109113416531, JCYJ20180507182610734); 中国科学院关键技术人才项目

Configurable Text-based Image Editing by Autoencoder-based Generative Adversarial Networks

Author:

WU Fu-Xiang
WU Fu-Xiang
Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找
CHENG Jun
CHENG Jun
Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于文本的图像编辑是多媒体领域的一个研究热点并具有重要的应用价值. 由于它是根据给定的文本编辑源图像, 而文本和图像的跨模态差异很大, 因此它是一项很具有挑战的任务. 在对编辑过程的直接控制和修正上, 目前方法难以有效地实现, 但图像编辑是用户喜好导向的, 提高可控性可以绕过或强化某些编辑模块以获得用户偏爱的结果. 针对该问题, 提出一种基于自动编码器的文本图像编辑模型. 为了提供便捷且直接的交互配置和编辑接口, 该模型在多层级生成对抗网络中引入自动编码器, 该自动编码器统一多层级间高维特征空间为颜色空间, 从而可以对该颜色空间下的中间编辑结果进行直接修正. 其次, 为了增强编辑图像细节及提高可控性, 构造了对称细节修正模块, 它以源图像和编辑图像为对称可交换输入, 融合文本特征以对前面输入编辑图像进行修正. 在MS-COCO和CUB200数据集上的实验表明, 该模型可以有效地基于语言描述自动编辑图像, 同时可以便捷且友好地修正编辑效果.

关键词:基于文本的图像编辑;生成对抗网络;交互编辑

Abstract:

Text-based image editing is popular in multimedia and is of great application value, which is also a challenging task as the source image is edited on the basis of a given text, and there is a large cross-modal difference between the image and text. The existing methods can hardly achieve effective direct control and correction of the editing process, but image editing is user preference-oriented, and some editing modules can be bypassed or enhanced by controllability improvement to obtain the results of user preference. Therefore, this study proposes a novel autoencoder-based image editing model according to text descriptions. In this model, an autoencoder is first introduced in stacked generative adversarial networks (SGANs) to provide convenient and direct interactive configuration and editing interfaces. The autoencoder can transform high-dimension feature space between multiple layers into color space and directly correct the intermediate editing results under the color space. Then, a symmetrical detail correction module is constructed to enhance the detail of the edited image and improve controllability, which takes the source image and the edited image as symmetrical exchangeable input to correct the previously input edited image by the fusion of text features. Experiments on the MS-COCO and CUB200 datasets demonstrate that the proposed model can effectively and automatically edit images on the basis of linguistic descriptions while providing user-friendly and convenient corrections to the editing.

Key words:text-based image editing;generative adversarial networks (GANs);interactive editing

引用本文

吴福祥,程俊.基于自编码器生成对抗网络的可配置文本图像编辑.软件学报,2022,33(9):3139-3151

复制

文章指标

点击次数:1512
下载次数: 4594
HTML阅读次数: 3193
引用次数: 0

历史

收稿日期:2021-06-30
最后修改日期:2021-08-15
录用日期:
在线发布日期: 2022-02-22
出版日期: 2022-09-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码