Survey on Deep Learning Methods for Freehand-sketch-based Visual Content Generation

doi:10.13328/j.cnki.jos.007053

微信服务号

微信订阅号

2025-4-16- 9

Home > Archive>Volume 35, Issue 7, 2024 >3497-3530. DOI:10.13328/j.cnki.jos.007053

PDF HTML XML Export Cite reminder

Survey on Deep Learning Methods for Freehand-sketch-based Visual Content Generation
DOI:
                        10.13328/j.cnki.jos.007053
                    
Author:
                        ZUO RanZUO Ran
Be?ing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HU Hao-XiangHU Hao-Xiang
Be?ing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DENG Xiao-MingDENG Xiao-Ming
Be?ing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Cui-XiaMA Cui-Xia
Be?ing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Hong-AnWANG Hong-An
Be?ing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [169]

Cited by

Materials

Comments

Abstract:

Freehand sketches can intuitively present users’ creative intention by drawing simple lines and enable users to express their thinking process and design inspiration or produce target images or videos. With the development of deep learning methods, sketch-based visual content generation performs cross-domain feature mapping by learning the feature distribution between sketches and visual objects (images and videos), enabling the automated generation of sketches from images and the automated generation of images or videos from sketches. Compared with traditional artificial creation, it effectively improves the efficiency and diversity of generation, which has become one of the most important research directions in computer vision and graphics and plays an important role in design, visual creation, etc. Therefore, this study presents an overview of the research progress and future development of deep learning methods for sketch-based visual content generation. The study classifies the existing work into sketch-based image generation and sketch-based video generation according to different visual objects and analyzes the generation models in detail with a combination of specific tasks including cross-domain generation between sketch and visual content, style transfer, and editing of visual content. Then, it summarizes and compares the commonly used datasets and points out sketch propagation methods to address in sufficient sketch data and evaluation methods of generated models. Furthermore, the study prospects the research trend based on the challenges faced by the sketch in the application of visual content generation and the future development direction of generated models.

Key words:human-computer interaction;freehand sketch;visual content generation;deep learning

Reference

[1] Chen T, Cheng MM, Tan P, Shamir A, Hu SM. Sketch2Photo: Internet image montage. ACM Trans. on Graphics, 2009, 28(5): 1–10.

[2] Eitz M, Richter R, Hildebrand K, Boubekeur T, Alexa M. PhotoSketcher: Interactive sketch-based image synthesis. IEEE Computer Graphics and Applications, 2011, 31(6): 56–66.

[3] Xie SN, Tu ZW. Holistically-nested edge detection. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1395–1403.

[4] Winnemöller H, Kyprianidis JE, Olsen SC. XDoG: An extended difference-of-Gaussians compendium including advanced image stylization. Computers & Graphics, 2012, 36(6): 740–753.

[5] Kang H, Lee S, Chui CK. Coherent line drawing. In: Proc. of the 5th Int’l Symp. on Non-photorealistic Animation and Rendering. San Diego: ACM, 2007. 43–50.

[6] Lu CW, Xu L, Jia JY. Combining sketch and tone for pencil drawing production. In: Proc. of the 2012 Symp. on Non-photorealistic Animation and Rendering. Annecy: Eurographics Association, 2012. 65–73.

[7] Su QK, Bai X, Fu HB, Tai CL, Wang J. Live sketch: Video-driven dynamic deformation of static drawings. In: Proc. of the 2018 CHI Conf. on Human Factors in Computing Systems. Montreal: ACM, 2018. 662.

[8] Dvorožnák M, Li W, Kim VG, Sýkora D. Toonsynth: Example-based synthesis of hand-colored cartoon animations. ACM Trans. on Graphics, 2018, 37(4): 167.

[9] Kingma DP, Welling M. Auto-encoding variational Bayes. In: Proc. of the 2nd Int’l Conf. on Learning Representations. Banff: ICLR, 2013.

[10] Goodfellow IG, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y. Generative adversarial nets. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2014. 2672–2680.

[11] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv:1411.1784, 2014.

[12] Xu P, Hospedales TM, Yin QY, Song YZ, Xiang T, Wang L. Deep learning for free-hand sketch: A survey. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023, 45(1): 285–312.

[13] Elasri M, Elharrouss O, Al-Maadeed S, Tairi H. Image generation: A review. Neural Processing Letters, 2022, 54(5): 4609–4646.

[14] Zhan FN, Yu YC, Wu RL, Zhang JH, Lu SJ, Liu LJ, Kortylewski A, Theobalt C, Xing E. Multimodal image synthesis and editing: A survey. arXiv:2112.13592v3, 2021.

[15] Chen SY, Zhang JQ, Zhao YY, Rosin PL, Lai YK, Gao L. A review of image and video colorization: From analogies to deep learning. Visual Informatics, 2022, 6(3): 51–68.

[16] 王建欣, 史英杰, 刘昊, 黄海峤, 杜方. 基于GAN的手绘草图图像翻译研究综述. 计算机应用研究, 2022, 39(8): 2249–2256.

Wang JX, Shi YJ, Liu H, Huang HQ, Du F. Research on freehand sketch to image translation based on generative adversarial networks. Application Research of Computers, 2022, 39(8): 2249–2256 (in Chinese with English abstract).

[17] Li MT, Lin Z, Mech R, Yumer E, Ramanan D. Photo-sketching: Inferring contour drawings from images. In: Proc. of the 2019 IEEE Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2019. 1403–1412.

[18] Song JF, Pang KY, Song YZ, Xiang T, Hospedales TM. Learning to sketch with shortcut cycle consistency. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 801–810.

[19] Chen W, Hays J. SketchyGAN: Towards diverse and realistic sketch to image synthesis. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 9416–9425.

[20] Chen SY, Su WC, Gao L, Xia SH, Fu HB. DeepFaceDrawing: Deep generation of face images from sketches. ACM Trans. on Graphics, 2020, 39(4): 72.

[21] Qiu HN, Wang C, Zhu H, Zhu XY, Gu JJ, Han XG. Two-phase hair image synthesis by self-enhancing generative model. Computer Graphics Forum, 2019, 38(7): 403–412.

[22] Wu X, Wang C, Fu HB, Shamir A, Zhang SH. DeepPortraitDrawing: Generating human body images from freehand sketches. Computers & Graphics, 2023, 116: 73–81.

[23] Ham C, Tarrés GC, Bui T, Hays J, Lin Z, Collomosse J. CoGS: Controllable generation and search from sketch and style. In: Proc. of the 2022 European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 632–650.

[24] Chen SY, Liu FL, Lai YK, Rosin PL, Li CP, Gao L. DeepFaceEditing: Deep face generation and editing with disentangled geometry and appearance control. ACM Trans. on Graphics, 2021, 40(4): 90.

[25] Zeng Y, Lin Z, Patel VM. SketchEdit: Mask-free local image manipulation with partial sketches. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 5941–5951.

[26] Wang YX, Wei YC, Qian XM, Zhu L, Yang Y. Sketch-guided scenery image outpainting. IEEE Trans. on Image Processing, 2021, 30: 2643–2655.

[27] Li XY, Zhang B, Liao J, Sander PV. Deep sketch-guided cartoon video inbetweening. IEEE Trans. on Visualization and Computer Graphics, 2022, 28(8): 2938–2952.

[28] Liu FL, Chen SY, Lai YK, Li CP, Jiang YR, Fu HB, Gao L. DeepFaceVideoEditing: Sketch-based deep editing of face videos. ACM Trans. on Graphics, 2022, 41(4): 167.

[29] Shi M, Zhang JQ, Chen SY, Gao L, Lai YK, Zhang FL. Reference-based deep line art video colorization. IEEE Trans. on Visualization and Computer Graphics, 2023, 29(6): 2965–2979.

[30] Kampelmuhler M, Pinz A. Synthesizing human-like sketches from natural images using a conditional convolutional decoder. In: Proc. of the 2020 IEEE Winter Conf. on Applications of Computer Vision. Snowmass: IEEE, 2020. 3192–3200.

[31] Ashtari A, Seo CW, Kang C, Cha SH, Noh J. Reference based sketch extraction via attention mechanism. ACM Trans. on Graphics, 2022, 41(6): 207.

[32] Zhang Y, Su GY, Qi YG, Yang J. Unpaired image-to-sketch translation network for sketch synthesis. In: Proc. of the 2019 IEEE Visual Communications and Image Processing. Sydney: IEEE, 2019. 1–4.

[33] Vinker Y, Pajouheshgar E, Bo JY, Bachmann RC, Bermano AH, Cohen-Or D, Zamir AR, Shamir A. CLIPasso: Semantically-aware object sketching. ACM Trans. on Graphics, 2022, 41(4): 86.

[34] Zhu MR, Liang CC, Wang NN, Wang XY, Li ZF, Gao XB. A Sketch-Transformer network for face photo-sketch synthesis. In: Proc. of the 30th Int’l Joint Conf. on Artificial Intelligence. Montreal: IJCAI.org, 2021. 1352–1358.

[35] Yu J, Xu XX, Gao F, Shi SJ, Wang M, Tao DC, Huang QM. Toward realistic face photo-sketch synthesis via composition-aided GANs. IEEE Trans. on Cybernetics, 2021, 51(9): 4350–4362.

[36] Qi XQ, Sun MY, Wang WN, Dong XX, Li Q, Shan CF. Face sketch synthesis via semantic-driven generative adversarial network. In: Proc. of the 2021 IEEE Int’l Joint Conf. on Biometrics (IJCB). Shenzhen: IEEE, 2021. 1–8.

[37] Zhang CY, Liu DC, Peng CL, Wang NN, Gao XB. Edge aware domain transformation for face sketch synthesis. IEEE Trans. on Information Forensics and Security, 2022, 17: 2761–2770.

[38] Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. In: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2337–2346.

[39] Isola P, Zhu JY, Zhou TH, Efros AA. Image-to-image translation with conditional adversarial networks. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 5967–5976.

[40] Lu YY, Wu SZ, Tai YW, Tang CK. Image generation from sketch constraint using contextual GAN. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 205–220.

[41] Koley S, Bhunia AK, Sain A, Chowdhury PN, Xiang T, Song YZ. Picture that sketch: Photorealistic image generation from abstract aketches. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 6850–6861.

[42] Ghosh A, Zhang R, Dokania PK, Wang O, Efros A, Torr P, Shechtman E. Interactive sketch & fill: Multiclass sketch-to-image translation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 1171–1180.

[43] Li ZY, Deng C, Yang EK, Tao DC. Staged sketch-to-image synthesis via semi-supervised generative adversarial networks. IEEE Trans. on Multimedia, 2021, 23: 2694–2705.

[44] 宗雨佳. 两阶段草图至图像生成模型与应用实现 [硕士学位论文]. 大连: 大连理工大学, 2021.

Zong YJ. A two-stage method and application implementation for image generation from sketch [MS. Thesis]. Dalian: Dalian University of Technology, 2021 (in Chinese with English abstract).

[45] 蔡雨婷, 陈昭炯, 叶东毅. 基于双层级联GAN的草图到真实感图像的异质转换. 模式识别与人工智能, 2018, 31(10): 877–886.

Cai YT, Chen ZJ, Ye DY. Bi-level cascading GAN-based heterogeneous conversion of sketch-to-realistic images. Pattern Recognition and Artificial Intelligence, 2018, 31(10): 877–886 (in Chinese with English abstract).

[46] Gao CY, Liu Q, Xu Q, Wang LM, Liu JZ, Zou CQ. SketchyCOCO: Image generation from freehand scene sketches. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5174–5183.

[47] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proc. of the 18th Int’l Conf. on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015. 234–241.

[48] Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco: AAAI Press, 2017. 4278–4284.

[49] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 4396–4405.

[50] Huang X, Liu MY, Belongie S, Kautz J. Multimodal unsupervised image-to-image translation. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 172–189.

[51] Li YH, Chen XJ, Yang BX, Chen ZH, Cheng ZH, Zha ZJ. DeepFacePencil: Creating face images from freehand sketches. In: Proc. of the 28th ACM Int’l Conf. on Multimedia. Seattle: ACM, 2020. 991–999.

[52] Xia WH, Yang YJ, Xue JH. Cali-sketch: Stroke calibration and completion for high-quality face image generation from human-like sketches. Neurocomputing, 2021, 460: 256–265.

[53] Yang S, Wang ZY, Liu JY, Guo ZM. Controllable sketch-to-image translation for robust face synthesis. IEEE Trans. on Image Processing, 2021, 30: 8797–8810.

[54] Li YH, Chen XJ, Wu F, Zha ZJ. LinesToFacePhoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proc. of the 27th ACM Int’l Conf. on Multimedia. Nice: ACM, 2019. 2323–2331.

[55] Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323–2326.

[56] Yang Y, Hossain Z, Gedeon T, Rahman S. S2FGAN: Semantically aware interactive sketch-to-face translation. In: Proc. of the 2022 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2022.3162–3171.

[57] Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 2287–2296.

[58] Olszewski K, Ceylan D, Xing J, Echevarria J, Chen ZL, Chen WK, Li H. Intuitive, interactive beard and hair synthesis with generative models. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 7444–7454.

[59] Xiao CF, Yu D, Han XG, Zheng YY, Fu HB. SketchHairSalon: Deep sketch-based hair image synthesis. ACM Trans. on Graphics, 2021, 40(6): 216.

[60] Ho TT, Virtusio JJ, Chen YY, Hsu CM, Hua KL. Sketch-guided deep portrait generation. ACM Trans. on Multimedia Computing, Communications, and Applications, 2020, 16(3): 88.

[61] Chen H, Zhu SC. A generative sketch model for human hair analysis and synthesis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2006, 28(7): 1025–1040.

[62] Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2242–2251.

[63] Xiang XY, Liu D, Yang X, Zhu YH, Shen XH, Allebach JP. Adversarial open domain adaptation for sketch-to-photo synthesis. In: Proc. of the 2022 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2022. 944–954.

[64] Liu RT, Yu Q, Yu SX. Unsupervised sketch to photo synthesis. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 36–52.

[65] Kazemi H, Taherkhani F, Nasrabadi NM. Unsupervised facial geometry learning for sketch to photo synthesis. In: Proc. of the 2018 Int’l Conf. of the Biometrics Special Interest Group. Darmstadt: IEEE, 2018. 1–5.

[66] Bashkirova D, Lezama J, Sohn K, Saenko K, Essa I. MaskSketch: Unpaired structure-guided masked image generation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 1879–1889.

[67] Chang HW, Zhang H, Jiang L, Liu C, Freeman WT. MaskGIT: Masked generative image transformer. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11305–11315.

[68] Esser P, Rombach R, Ommer B. Taming transformers for high-resolution image synthesis. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 12868–12878.

[69] Wang SY, Bau D, Zhu JY. Sketch your own GAN. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 14030–14040.

[70] Israr SM, Zhao F. Customizing GAN using few-shot sketches. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 2229–2238.

[71] Yang BX, Chen XJ, Wang CQ, Zhang C, Chen ZH, Sun XY. Semantics-preserving sketch embedding for face generation. IEEE Trans. on Multimedia, 2022. 1–15.

[72] Liu BC, Song KP, Zhu YZ, Elgammal A. Sketch-to-art: Synthesizing stylized art images from sketches. In: Proc. of the 15th Asian Conf. on Computer Vision. Kyoto: Springer, 2020. 207–222.

[73] Zhang LM, Ji Y, Lin X, Liu CP. Style transfer for anime sketches with enhanced residual U-Net and auxiliary classifier GAN. In: Proc. of the 4th IAPR Asian Conf. on Pattern Recognition (ACPR). Nanjing: IEEE, 2017. 506–511.

[74] Zhang LM, Li CZ, Simo-Serra E, Ji Y, Wong TT, Liu CP. User-guided line art flat filling with split filling mechanism. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 9889–9898.

[75] Sangkloy P, Lu JW, Fang C, Yu F, Hays J. Scribbler: Controlling deep image synthesis with sketch and color. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6836–6845.

[76] Xian W, Sangkloy P, Agrawal V, Raj A, Lu JW, Fang C, Yu F, Hays J. TextureGAN: Controlling deep image synthesis with texture patches. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8456–8465.

[77] Li JN, Liu SQ, Cao MY. Line artist: A multiple style sketch to painting synthesis scheme. arXiv:1803.06647, 2018.

[78] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego: ICLR, 2015.

[79] Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the Web. Technical Report, Stanford InfoLab, 1999.

[80] Huang JL, Jing L, Tan ZF, Kwong S. Multi-density sketch-to-image translation network. IEEE Trans. on Multimedia, 2021, 24: 4002–4015.

[81] Tan ZT, Chai ML, Chen DD, Liao J, Chu Q, Yuan L, Tulyakov S, Yu N. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Trans. on Graphics, 2020, 39(4): 95.

[82] Lee J, Kim E, Lee Y, Kim D, Chang J, Choo J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5800–5809.

[83] Zhang P, Zhang B, Chen D, Yuan L, Wen F. Cross-domain correspondence learning for exemplar-based image translation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5143–5153.

[84] Zhou XR, Zhang B, Zhang T, Zhang P, Bao JM, Chen D, Zhang ZF, Wen F. CoCosNet v2: Full-resolution correspondence learning for image translation. In: Proc. of the 2021 IEEE Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 11465–11475.

[85] Zuiderveld J. Style-content disentanglement in language-image pretraining representations for zero-shot sketch-to-image synthesis. arXiv:2206.01661v1, 2022.

[86] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 8748–8763.

[87] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. ImageNet: A large-scale hierarchical image database. In: Proc. of the 2009 IEEE Conf. on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255.

[88] Liu BC, Zhu YZ, Song KP, Elgammal A. Self-supervised sketch-to-image synthesis. In: Proc. of the 37th AAAI Conf. on Artificial Intelligence. Washington: AAAI Press, 2021. 2073–2081.

[89] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs. In: Proc. of the 34th Int’l Conf. on Machine Learning. Sydney: PMLR, 2017. 2642–2651.

[90] Zhang LM, Li CZ, Wong TT, Li Y, Liu CP. Two-stage sketch colorization. ACM Trans. on Graphics, 2018, 37(6): 261.

[91] Kim H, Jhoo HY, Park E, Yoo S. Tag2Pix: Line art colorization using text tag with SECat and changing loss. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9055–9064.

[92] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7132–7141.

[93] Jo Y, Park J. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 1745–1753.

[94] Portenier T, Hu QY, Szabo A, Bigdeli SA, Favaro P, Zwicker M. FaceShop: Deep sketch-based face image editing. ACM Trans. on Graphics, 2018, 37(4): 99.

[95] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved training of Wasserstein GANs. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 5769–5779.

[96] Yu JH, Lin Z, Yang JM, Shen XH, Lu X, Huang T. Free-form image inpainting with gated convolution. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 4470–4479.

[97] Yang S, Wang ZY, Liu JY, Guo ZM. Deep plastic surgery: Robust and controllable image editing with human-drawn sketches. In: Proc. of the16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 601–617.

[98] Liu HY, Wan ZY, Huang W, Song YB, Han XT, Liao J, Jiang B, Liu W. DeFLOCNet: Deep image editing via flexible low-level controls. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 10760–10769.

[99] Graves A. Long short-term memory. In: Graves A, ed. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012. 37–45.

[100] Zhang HC, Yu G, Chen T, Luo GZ. Sketch me a video. arXiv:2110.04710v1, 2021.

[101] Loftsdottir D, Guzdial M. SketchBetween: Video-to-video synthesis for sprite animation via sketches. In: Proc. of the 17th Int’l Conf. on the Foundations of Digital Games. Athens: ACM, 2022. 32.

[102] Thasarathan H, Nazeri K, Ebrahimi M. Automatic temporally coherent video colorization. In: Proc. of the 16th Conf. on Computer and Robot Vision (CRV). Kingston: IEEE, 2019. 189–194.

[103] Wang TC, Liu MY, Zhu JY, Tao A, Kauta J, Catanzaro B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8798–8807.

[104] Xue TF, Wu JJ, Bouman KL, Freeman WT. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In: Proc. of the 30th Int’l Conf. on Neural Information Processing Systems. Barcelona: Curran Associates Inc., 2016. 91–99.

[105] van den Oord A, Vinyals O, Kavukcuoglu K. Neural discrete representation learning. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6309–6318.

[106] Yu Q, Liu F, Song YZ, Xiang T, Hospedales TM, Loy CC. Sketch me that shoe. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 799–807.

[107] Sangkloy P, Burnell N, Ham C, Hays J. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Trans. on Graphics, 2016, 35(4): 119.

[108] Tang XG, Wang XO. Face photo-sketch synthesis and recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009, 31(11): 1955–1967.

[109] Zhang W, Wang WG, Tang XO. Coupled information-theoretic encoding for face photo-sketch recognition. In: Proc. of the 2011 Conf. on Computer Vision and Pattern Recognition (CVPR). Colorado Springs: IEEE, 2011. 513–520.

[110] Liu ZW, Luo P, Wang XG, Tang XO. Deep learning face attributes in the wild. In: Proc of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 3730–3738.

[111] Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD Birds-200-2011 dataset. California: California Institute of Technology, 2011. https://authors.library.caltech.edu/records/cvm3y-5hh21

[112] Krause J, Stark M, Deng J, Li FF. 3D object representations for fine-grained categorization. In: Proc. of the 2013 IEEE Int’l Conf. on Computer Vision Workshops. Sydney: IEEE, 2013. 554–561.

[113] Pirrone R, Cannella V, Gambino O, Pipitone A, Russo G. WikiArt: An ontology-based information retrieval system for arts. In: Proc. of the 9th Int’l Conf. on Intelligent Systems Design and Applications. Pisa: IEEE, 2009. 913–918.

[114] Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.

[115] Lee CH, Liu ZW, Wu LY, Luo P. Maskgan: Towards diverse and interactive facial image manipulation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 5548–5557.

[116] Branwen G, Gokaslan A. Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset. 2017. https://www.gwern.net/Danbooru2017

[117] Yang ZX, Dong J, Liu P, Yang Y, Yan SC. Very long natural scenery image prediction by outpainting. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 10560–10569.

[118] Liu ZW, Luo P, Qiu S, Wang XG, Tang XO. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1096–1104.

[119] Zhou BL, Lapedriza A, Khosla A, Oliva A, Torralba A. Places: A 10 million image database for scene recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452–1464.

[120] Chung JS, Nagrani A, Zisserman A. VoxCeleb2: Deep speaker recognition. In: Proc. of the 19th Annual Conf. of the Int’l Speech Communication Association (Interspeech 2018). Hyderabad: ISCA, 2018. 1086–1090.

[121] Siarohin A, Lathuilière S, Tulyakov S, Ricci E, Sebe N. Animating arbitrary objects via deep motion transfer. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2372–2381.

[122] Eitz M, Hays J, Alexa M. How do humans sketch objects? ACM Trans. on Graphics, 2012, 31(4): 1–10.

[123] Ha D, Eck D. A neural representation of sketch drawings. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.

[124] Caesar H, Uijlings J, Ferrari V. COCO-stuff: Thing and stuff classes in context. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1209–1218.

[125] Li YJ, Fang C, Hertzmann A, Shechtman E, Yang MH. Im2Pencil: Controllable pencil illustration from photographs. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 1525–1534.

[126] Gatys LA, Ecker AS, Bethge M. Image style transfer using convolutional neural networks. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2414–2423.

[127] Inoue N, Ito D, Xu N, Yang J, Price B, Yamasaki T. Learning to trace: Expressive line drawing generation from photographs. Computer Graphics Forum, 2019, 38(7): 69–80.

[128] Recasens A, Kellnhofer P, Stent S, Matusik W, Torralba A. Learning to zoom: A saliency-based sampling layer for neural networks. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 52–67.

[129] Weber M. AutoTrace. 2018. http://autotrace.sourceforge.net/

[130] Zhang TY, Suen CY. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 1984, 27(3): 236–239.

[131] Simo-Serra E, Iizuka S, Ishikawa H. Mastering sketching: Adversarial augmentation for structured prediction. ACM Trans. on Graphics, 2018, 37(1): 11.

[132] Simo-Serra E, Iizuka S, Sasaki K, Ishikawa H. Learning to simplify: Fully convolutional networks for rough sketch cleanup. ACM Trans. on Graphics, 2016, 35(4): 121.

[133] Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs. In: Proc. of the 30th Int’l Conf. on Neural Information Processing Systems. Barcelona: Curran Associates Inc., 2016. 2234–2242.

[134] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2818–2826.

[135] Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6629–6640.

[136] Bińkowski M, Sutherland DJ, Arbel M, Gretton A. Demystifying MMD GANs. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.

[137] Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 586–595.

[138] Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: From error visibility to structural similarity. IEEE Trans. on Image Processing, 2004, 13(4): 600–612.

[139] Zhang L, Zhang L, Mou XQ, Zhang D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. on Image Processing, 2011, 20(8): 2378–2386.

[140] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 3431–3440.

[141] Wang Q, Kong D, Lin FY, Qi YG. DiffSketching: Sketch control image synthesis with diffusion models. In: Proc. of the 33rd British Machine Vision Conf. London: BMVA Press, 2022. 67

[142] Cheng SI, Chen YJ, Chiu WC, Tseng HY, Lee HY. Adaptively-realistic image generation from stroke and sketch with diffusion model. In: Proc. of the 2023 IEEE/CVF Winter Conf. on Applications of Computer Vision. 2023. 4043–4051.

[143] Zhang LM, Agrawala M. Adding conditional control to text-to-image diffusion models. arXiv:2302.05543, 2023.

[144] Mou C, Wang XT, Xie LB, Wu YZ, Zhang J, Qi ZG, Shan Y, Qie XH. T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv:2302.08453, 2023.

[145] Voynov A, Aberman K, Cohen-Or D. Sketch-guided text-to-image diffusion models. In: Proc. of the 2023 ACM Special Interest Group on Computer Graphics and Interactive Techniques Conf. (SIGGRAPH 2023) Conf. Los Angeles: ACM, 2023. 1–11.

[146] MaungMaung A, Shing M, Mitsui K, Sawada K, Okura F. Text-guided scene sketch-to-photo synthesis. arXiv:2302.06883, 2023.

[147] Kim K, Park S, Lee J, Choo J. Reference-based image composition with sketch via structure-aware diffusion model. arXiv:2304.09748, 2023.

[148] Peng YC, Zhao CQ, Xie HR, Fukusato T, Miyata K. DiffFaceSketch: High-fidelity face image synthesis with sketch-guided latent diffusion model. arXiv:2302.06908, 2023.

[149] Huang HB, Kalogerakis E, Yumer E, Mech R. Shape synthesis from sketches via procedural models and convolutional networks. IEEE Trans. on Visualization and Computer Graphics, 2017, 23(8): 2003–2013.

[150] Wang LJ, Qian C, Wang JF, Fang Y. Unsupervised learning of 3D model reconstruction from hand-drawn sketches. In: Proc. of the 26th ACM Int’l Conf. on Multimedia. Seoul: ACM, 2018. 1820–1828.

[151] Guillard B, Remelli E, Yvernay P, Fua P. Sketch2Mesh: Reconstructing and editing 3D shapes from sketches. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 13003–13012.

[152] Zhang SH, Guo YC, Gu QW. Sketch2Model: View-aware 3D modeling from single free-hand sketches. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 6000–6017.

[153] Zhong Y, Qi YG, Gryaditskaya Y, Zhang HG, Song YZ. Towards practical sketch-based 3D shape generation: The role of professional sketches. IEEE Trans. on Circuits and Systems for Video Technology, 2021, 31(9): 3518–3528.

[154] Gao CJ, Yu Q, Sheng L, Song YZ, XU D. SketchSampler: Sketch-based 3D reconstruction via view-dependent depth sampling. In: Proc. of the 17th European Conf. on Computer Vision. Cham: Springer, 2022. 464–479.

[155] Mikaeili A, Perel O, Safaee M, Cohen-Or D, Mahdavi-Amiri A. SKED: Sketch-guided text-based 3D editing. arXiv:2303.10735, 2023.

[156] Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021, 65(1): 99–106.

[157] Han XG, Hou KC, Du D, Qiu YD, Cui SG, Zhou K, Yu YZ. CaricatureShop: Personalized and photorealistic caricature sketching. IEEE Trans. on Visualization and Computer Graphics, 2020, 26(7): 2349–2361.

[158] Ling JW, Wang ZB, Lu M, Wang Q, Qian C, Xu F. Structure-aware editable morphable model for 3D facial detail animation and manipulation. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 249–267.

[159] Gao L, Liu FL, Chen SY, Jiang KW, Li CP, Lai YK, Fu HB. SketchFaceNeRF: Sketch-based facial generation and editing in neural radiance fields. ACM Trans. on Graphics, 2023, 42(4): 159.

[160] Yang KZ, Lu JT, Hu SY, Chen XJ. Deep 3D modeling of human bodies from freehand sketching. In: Proc. of the 27th Int’l Conf. on Multimedia Modeling. Prague: Springer, 2021. 36–48.

[161] Brodt K, Bessmeltsev M. Sketch2Pose: Estimating a 3D character pose from a bitmap sketch. ACM Trans. on Graphics, 2022, 41(4): 85.

[162] Shen YF, Zhang CG, Fu HB, Zhou K, Zheng YY. DeepSketchHair: Deep sketch-based 3D hair modeling. IEEE Trans. on Visualization and Computer Graphics, 2021, 27(7): 3250–3263.

[163] Li MC, Sheffer A, Grinspun E, Vining N. Foldsketch: Enriching garments with physically reproducible folds. ACM Trans. on Graphics, 2018, 37(4): 133.

[164] Wang TY, Ceylan D, Popović J, Mitra NJ. Learning a shared shape space for multimodal garment design. ACM Trans. on Graphics, 2018, 37(6): 203.

[165] Kaspar A, Wu K, Luo YY, Makatura L, Matusik W. Knit sketching: From cut & sew patterns to machine-knit garments. ACM Trans. on Graphics, 2021, 40(4): 63.

[166] Deng Z, Liu Y, Pan H, Jabi W, Zhang JY, Deng BL. Sketch2PQ: Freeform planar quadrilateral mesh design via a single sketch. IEEE Trans. on Visualization and Computer Graphics, 2023, 29(9): 3826–3839.

Get Citation

左然,胡皓翔,邓小明,马翠霞,王宏安.基于手绘草图的视觉内容生成深度学习方法综述.软件学报,2024,35(7):3497-3530

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:March 02,2023
Revised:June 05,2023
Adopted:
Online: January 31,2024
Published: July 06,2024

You are the first2035272Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History