Text-to-Chinese-painting Method Based on Multi-domain VQGAN

doi:10.13328/j.cnki.jos.006769

微信服务号

微信订阅号

Home > Archive>Volume 34, Issue 5, 2023 >2116-2133. DOI:10.13328/j.cnki.jos.006769

PDF HTML XML Export Cite reminder

Text-to-Chinese-painting Method Based on Multi-domain VQGAN
DOI:
                        10.13328/j.cnki.jos.006769
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the development of generative adversarial networks (GANs), synthesizing images from textual descriptions has become an active research area. However, textual descriptions used for image generation are often in English, and the generated objects are mostly faces, flowers, birds, etc. Few studies have been conducted on the generation of Chinese paintings with Chinese descriptions. The text-to-image generation often requires an enormous number of labeled image-text pairs, and the cost of dataset production is high. With the advance in multimodal pre-training, the GAN generation process can be guided in an optimized way, which significantly reduces the demand for datasets and computational resources. In this study, a multi-domain vector quatization generative adversarial network (VQGAN) model is proposed to simultaneously generate Chinese paintings in multiple domains. Furthermore, a multimodal pre-trained model WenLan is used to calculate the distance loss between generated images and textual descriptions. The semantic consistency between images and texts is achieved by optimization of the hidden space variables input into multi-domain VQGAN. Finally, an ablation experiment is conducted to compare different variants of multi-domain VQGAN in terms of the FID and R-precision metrics, and a user investigation is carried out. The results demonstrate that the complete multi-domain VQGAN model outperforms the original VQGAN model in terms of image quality and text-image semantic consistency.

Reference

Cited by

Get Citation

孙泽龙,杨国兴,温静远,费楠益,卢志武,文继荣.基于多域VQGAN的文本生成国画方法研究.软件学报,2023,34(5):2116-2133

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 16,2022
Revised:May 29,2022
Adopted:
Online: September 20,2022
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History