RJXB软件学报Journal of Software1000-9825软件学报编辑部中国北京rjxb-31-5-146510.13328/j.cnki.jos.005988TP391模式识别与人工智能Pattern Recognition and Artificial Intelligence基于深度学习的自然场景文本检测与识别综述Review of Natural Scene Text Detection and Recognition Based on Deep Learning王建新WANGJian-Xin
Natural scene text detection and recognition is important for obtaining information from scenes, and it can be improved by the help of deep learning. In this study, the deep learning-based methods of text detection and recognition in natural scenes are classified, analyzed, and summarized. Firstly, the research background of natural scene text detection and recognition and the main technical research routes are discussed. Then, according to different processing phases of natural scene text information processing, the text detection model, text recognition model and end-to-end text recognition model are further introduced, in which the basic ideas, advantages, and disadvantages of each method are also discussed and analyzed. Furthermore, the common standard datasets and performance evaluation indicators and functions are enumerated, and the experimental results of different models are compared and analyzed. Finally, the challenge and development trends of deep learning-based text detection and recognition in natural scenes are summarized.
深度学习自然场景文本检测文本识别端到端deep learningnatural scenetext detectiontext recognitionend-to-end国家重点研发计划2018YFC1603302国家重点研发计划2018YFC1603305国家重点研发计划(2018YFC1603302,2018YFC1603305)National Key Research and Development Program of China2018YFC1603302National Key Research and Development Program of China2018YFC1603305National Key Research and Development Program of China (2018YFC1603302, 2018YFC1603305)
自然场景文本是指存在于任意自然情境下的文本内容, 例如道路路牌、广告牌、商场指示牌、商品包装等.自然场景下的文本识别(scene text recognition, 简称STR)通常先利用文本检测技术得到文本位置信息, 再使用文本识别技术得到根据位置信息裁剪的图像中的文本内容.不同于文档图像中的文本规则性, 自然场景文本通常在字体大小、字体类别、排列方向、字体颜色、文本稀疏程度就有很大的差异性, 同时受到光照强度不同、复杂背景和拍照角度等因素的影响, 自然场景文本检测与识别技术研究有很大的阻力.目前, 传统的OCR技术无法适用于复杂自然场景图像中的文本识别.随着信息技术的发展和智能应用的需求不断增加, 从自然场景图像中获取文本信息的技术研究具有广阔的应用前景, 成为研究者关注的焦点.其中, 文档分析和识别国际会议(Int’l Conf. on document analysis and recognition, 简称ICDAR)是推动该领域不断发展的重要国际会议, 国内清华大学和中国科院自动化研究所曾在2011年共同举办了第11届文档分析和识别会议(ICDAR 2011).
ReferencesLiYXMaJWThe developments and challenges of text detection algorithms201733455857110.16798/j.issn.1003-0530.2017.04.016
Li YX, Ma JW. The developments and challenges of text detection algorithms. Journal of Signal Processing, 2017, 33(4):558-571. (in Chinese with English abstract). [doi: 10.16798/j.issn.1003-0530.2017.04.016]
WangRMSangNDingDChenJYeQXGaoCXLiuLText detection in natural scene image: A survey201844122113214110.16383/j.aas.2018.c170572
Wang RM, Sang N, Ding D, Chen J, Ye QX, Gao CX, Liu L. Text detection in natural scene image: A survey. Acta Automatica Sinica, 2018, 44(12):2113-2141(in Chinese with English abstract). http://kns.cnki.net/kcms/detail/11.2109.TP.20181010.1713.003.html [doi: 10.16383/j.aas.2018.c170572]
NeumannLMatasJA method for text localization and recognition in real-world images201077078310.1007/978-3-642-19318-7_60
Neumann L, Matas J. A method for text localization and recognition in real-world images. In: Proc. of the Asian Conf. on Computer Vision. 2010. 770-783. [doi: 10.1007/978-3-642-19318-7_60]
WangKBabenkoBBelongieSJEnd-to-end scene text recognition20111457146410.1109/ICCV.2011.6126402
Wang K, Babenko B, Belongie SJ. End-to-end scene text recognition. In: Proc. of the Int'l Conf. on Computer Vision. 2011. 1457-1464. [doi: 10.1109/ICCV.2011.6126402]
HintonGESalakhutdinovRReducing the dimensionality of data with neural networks2006313578650450710.1126/science.1127647
Hinton GE, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507. [doi: 10.1126/science.1127647]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
doi: 10.3115/v1/D14-1179]]]>
EpshteinBOfekEWexlerYDetecting text in natural scenes with stroke width transform20102963297010.1109/CVPR.2010.5540041
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2010. 2963-2970. [doi: 10.1109/CVPR.2010.5540041]
MatasJChumOUrbanMPajdlaTRobust wide-baseline stereo from maximally stable extremal regions2004221076176710.1016/j.imavis.2004.02.006
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Computing, 2004, 22(10):761-767. [doi: 10.1016/j.imavis.2004.02.006]
WangKBelongieSJWord spotting in the wild201059160410.1007/978-3-642-15549-9_43
Wang K, Belongie SJ. Word spotting in the wild. In: Proc. of the European Conf. on Computer Vision. 2010. 591-604. [doi: 10. 1007/978-3-642-15549-9_43]
TianSPanYHuangCLuSYuKTanCLText flow: A unified text detection system in natural scene images20154651465910.1109/ICCV.2015.528
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL. Text flow: A unified text detection system in natural scene images. In: Proc. of the Int'l Conf. on Computer Vision. 2015. 4651-4659. [doi: 10.1109/ICCV.2015.528]
LiaoMShiBBaiXTextBoxes++: A single-shot oriented scene text detector20182783676369010.1109/TIP.2018.2825107
Liao M, Shi B, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. on Image Processing, 2018, 27(8): 3676-3690. [doi: 10.1109/TIP.2018.2825107]
LiaoMZhuZShiBXiaGBaiXRotation-sensitive regression for oriented scene text detection180305265
Liao M, Zhu Z, Shi B, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. arXiv:1803.05265, 2018.
TianZHuangWHeTHePQiaoYDetecting text in natural image with connectionist text proposal network2016567210.1007/978-3-319-46484-8_4
Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proc. of the European Conf. on Computer Vision. 2016. 56-72. [doi: 10.1007/978-3-319-46484-8_4]
doi: 10. 1109/CVPR.2017.371]]]>
doi: 10.1109/CVPR.2017.283]]]>
ZhongZSunLHuoQAn anchor-free region proposal network for faster R-CNN based text detection approaches201922331532710.1007/s10032-019-00335-y
Zhong Z, Sun L, Huo Q. An anchor-free region proposal network for faster R-CNN based text detection approaches. Int'l Journal on Document Analysis and Recognition, 2019, 22(3):315-327. [doi: 10.1007/s10032-019-00335-y]
DengDLiuHCaiDLiXPixelLink: Detecting scene text via instance segmentation201867736780
Deng D, Liu H, Cai D, Li X. PixelLink: Detecting scene text via instance segmentation. In: Proc. of the National Conf. on Artificial Intelligence. 2018. 6773-6780.
Li X, Wang W, Hou W, Liu R, Lu T, Yang J. Shape robust text detection with progressive scale expansion network. arXiv: 1903. 12473v2, 2018.
XuYWangYZhouWWangYYangZBaiXTextField: Learning a deep direction field for irregular scene text detection201828115566557910.1109/TIP.2019.2900589
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X. TextField: Learning a deep direction field for irregular scene text detection. IEEE Trans. on Image Processing, 2018, 28(11):5566-5579. [doi: 10.1109/TIP.2019.2900589]
Zhu Y, Du J. TextMountain: Accurate scene text detection via instance segmentation. arXiv: 1811.12786, 2018.
DaiYHuangZGaoYXuYChenKGuoJQiuWFused text segmentation networks for multi-oriented scene text detection20183604360910.1109/ICPR.2018.8546066
Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W. Fused text segmentation networks for multi-oriented scene text detection. In: Proc. of the Int'l Conf. on Pattern Recognition. 2018. 3604-3609. [doi: 10.1109/ICPR.2018.8546066]
Li Y, Yu Y, Li Z, Lin Y, Xu M, Li J, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv: 1811.07432v1, 2018.
doi: 10.1109/CVPR.2014.81]]]>
RenSHeKGirshickRBSunJFaster R-CNN: Towards real-time object detection with region proposal networks20173961137114910.1109/TPAMI.2016.2577031
Ren S, He K, Girshick RB, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. on Pattern Analysis Machine Intelligence, 2017, 39(6):1137-1149. [doi: 10.1109/TPAMI.2016.2577031]
LiuWAnguelovDErhanDSzegedyCReedSEFuCBergACSSD: Single shot MultiBox detector2016213710.1007/978-3-319-46448-0_2
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC. SSD: Single shot MultiBox detector. In: Proc. of the European Conf. on Computer Cision. 2016. 21-37. [doi: 10.1007/978-3-319-46448-0_2]
Dai J, Li Y, He K, Sun J. R-FCN: Object detection via region-based fully convolutional networks. arXiv: 1605.06409v2, 2016.
doi: 10.1109/CVPR.2016.91]]]>
doi:10.1109/CVPR.2016.254]]]>
LiaoMShiBBaiXWangXLiuWTextBoxes: A fast text detector with a single deep neural network201641614167
Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: A fast text detector with a single deep neural network. In: Proc. of the National Conf. on Artificial Intelligence. 2016. 4161-4167.
doi:10.1109/CVPR.2017.368]]]>
MaJShaoWHaoYLiWHongWZhengYXueXArbitrary-oriented scene text detection via rotation proposals201820113111312210.1109/TMM.2018.2818020
Ma J, Shao W, Hao Y, Li W, Hong W, Zheng Y, Xue X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. on Multimedia, 2018, 20(11):3111-3122. [doi: 10.1109/TMM.2018.2818020]
JiangYZhuXWangXYangSLuoZR2CNN: Rotational region CNN for arbitrarily-oriented scene text detection201810.1109/ICPR.2018.8545598
Jiang Y, Zhu X, Wang X, Yang S, Luo Z. R2CNN: Rotational region CNN for arbitrarily-oriented scene text detection. In: Proc. of the Int'l Conf. on Pattern Recognition. 2018. [doi: 10.1109/ICPR.2018.8545598]
ZhuYDuJSliding line point regression for shape robust scene text detection20183735374010.1109/icpr.2018.8545067
Zhu Y, Du J. Sliding line point regression for shape robust scene text detection. In: Proc. of the Int'l Conf. on Pattern Recognition. 2018. 3735-3740. [doi: 10.1109/icpr.2018.8545067]
HePHuangWHeTZhuQQiaoYLiXSingle shot text detector with regional attention20173066307410.1109/iccv.2017.331
He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 3066-3074. [doi: 10.1109/iccv.2017.331]
doi: 10.1109/CVPR.2015.7298594]]]>
DengLGongYLinYShuaiJTuXZhangYMaZXieMDetecting multi-oriented text with corner-based region proposals201933413414210.1016/j.neucom.2019.01.013
Deng L, Gong Y, Lin Y, Shuai J, Tu X, Zhang Y, Ma Z, Xie M. Detecting multi-oriented text with corner-based region proposals. Neurocomputing, 2019, 334:134-142. [doi: 10.1016/j.neucom.2019.01.013]
HeTHuangWQiaoYYaoJText-attentional convolutional neural network for scene text detection20162562529254110.1109/TIP.2016.2547588
He T, Huang W, Qiao Y, Yao J. Text-attentional convolutional neural network for scene text detection. IEEE Trans. on Image Processing, 2016, 25(6):2529-2541. [doi: 10.1109/TIP.2016.2547588]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the Int'l Conf. on Learning Representations. 2015.
doi:10.1016/j.patcog.2019.06.020]]]>
Liu J, Zhang C, Sun Y, Han J, Ding E. Detecting text in the wild with deep character embedding network. arXiv: 1901.00363, 2019.
LongJShelhamerEDarrellTFully convolutional networks for semantic segmentation201739464065110.1109/TPAMI.2016.2572683
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651[doi: 10.1109/TPAMI.2016.2572683]
doi: 10.1109/CVPR.2017.106]]]>
doi: 10.1109/CVPR.2017.472]]]>
TianXWangLDingQReview of image semantic segmentation based on deep learning201930244046810.13328/j.cnki.jos.005659
Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2):440-468(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm [doi: 10.13328/j. cnki.jos.005659]
HeWZhangXYinFLiuCDeep direct regression for multi-oriented scene text detection201774575310.1109/ICCV.2017.87
He W, Zhang X, Yin F, Liu C. Deep direct regression for multi-oriented scene text detection. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 745-753. [doi: 10.1109/ICCV.2017.87]
SongYCuiYHanHShanSChenXScene text detection via deep semantic feature fusion and attention-based refinement20183747375210.1109/icpr.2018.8546050
Song Y, Cui Y, Han H, Shan S, Chen X. Scene text detection via deep semantic feature fusion and attention-based refinement. In: Proc. of the Int'l Conf. on Pattern Recognition. 2018. 3747-3752. [doi: 10.1109/icpr.2018.8546050]
XueCLuSZhanFAccurate scene text detection through border semantics awareness and bootstrapping2018370387
Xue C, Lu S, Zhan F. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. of the European Conf. on Computer Vision. 2018. 370-387.
LongSRuanJZhangWHeXWuWYaoCTextSnake: A flexible representation for detecting text of arbitrary shapes2018193510.1007/978-3-030-01216-8_2
Long S, Ruan J, Zhang W, He X, Wu W, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. of the European Conf. on Computer Vision. 2018. 19-35. [doi: 10.1007/978-3-030-01216-8_2]
doi: 10.1109/CVPR.2016.451]]]>
He T, Huang W, Qiao Y, Yao J. Accurate text localization in natural image with cascaded convolutional text network. arXiv: 1603. 09423, 2016.
WuYNatarajanPSelf-organized text detection with minimal post-processing via border learning20175010501910.1109/iccv.2017.535
Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 5010-5019. [doi: 10.1109/iccv.2017.535]
PolzounovAAblavatskiAEscaleraSLuSCaiJWordfence: Text detection in natural images with border awareness20171222122610.1109/icip.2017.8296476
Polzounov A, Ablavatski A, Escalera S, Lu S, Cai J. Wordfence: Text detection in natural images with border awareness. In: Proc. of the IEEE Int'l Conf. on Image Processing. 2017. 1222-1226. [doi: 10.1109/icip.2017.8296476]
Xue C, Lu S, Zhang W. MSR: Multi-scale shape regression for scene text detection. arXiv: 1901.02596v2, 2019.
Bazazian D, Gomez R, Nicolaou A, Bigorda LGI, Karatzas D, Bagdanov AD. Improving text proposals for scene images with fully convolutional networks. arXiv: 1702.05089, 2017.
Jiang F, Hao Z, Liu X. Deep scene text detection with connected component proposals. arXiv: 1708.05133, 2017.
doi: 10.1109/CVPR.2018.00788]]]>
YangQChengMZhouWChenYQiuMLinWIncepText: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection20181071107710.24963/ijcai.2018/149
Yang Q, Cheng M, Zhou W, Chen Y, Qiu M, Lin W. IncepText: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In: Proc. of the Int'l Joint Conf. on Artificial Intelligence, 2018. 1071-1077. [doi: 10.24963/ ijcai.2018/149]
BissaccoACumminsMJNetzerYNevenHPhotoOCR: Reading text in uncontrolled conditions2013785792
Bissacco A, Cummins MJ, Netzer Y, Neven H. PhotoOCR: Reading text in uncontrolled conditions. In: Proc. of the Int'l Conf. on Computer Vision. 2013. 785-792.
GoelVMishraAAlahariKJawaharCVWhole is greater than sum of parts: Recognizing scene text words201339840210.1109/ICDAR.2013.87
Goel V, Mishra A, Alahari K, Jawahar CV. Whole is greater than sum of parts: Recognizing scene text words. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2013. 398-402. [doi: 10.1109/ICDAR.2013.87]
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv: 1406.2227v4, 2014.
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Deep structured output learning for unconstrained text recognition. arXiv: 1412.5903v5, 2014.
ShiBBaiXYaoCAn end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition201639112298230410.1109/TPAMI.2016.2646371
Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. on Pattern Analysis Machine Intelligence, 2016, 39(11):2298-2304. [doi: 10.1109/TPAMI.2016.2646371]
HePHuangWQiaoYLoyCCTangXReading scene text in deep convolutional sequences201635013508
He P, Huang W, Qiao Y, Loy CC, Tang X. Reading scene text in deep convolutional sequences. In: Proc. of the AAAI Conf. on Artificial Intelligence. 2016. 3501-3508.
Wu Y, Yin F, Zhang X, Liu L, Liu C. SCAN: Sliding convolutional attention network for scene text recognition. arXiv: 1603. 09423, 2018.
doi: 10.1109/CVPR.2018.00584]]]>
doi: 10.1109/CVPR.2018.00163]]]>
GravesAFernandezSGomezFJSchmidhuberJConnectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks200636937610.1145/1143844.1143891
Graves A, Fernandez S, Gomez FJ, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proc. of the Int'l Conf. on Machine Learning. 2006. 369-376. [doi: 10.1145/1143844.1143891]
YangXHeDHuangWZhouZOrorbiaAGKiferDGilesCLSmart library: Identifying books in a library using richly supervised deep scene text reading201610.1109/JCDL.2017.7991581
Yang X, He D, Huang W, Zhou Z, Ororbia AG, Kifer D, Giles CL. Smart library: Identifying books in a library using richly supervised deep scene text reading. In: Proc. of the Joint Conf. on Digital Libraries. 2016. [doi: 10.1109/JCDL.2017.7991581]
Yang C, Yin X, Li Z, Wu J, Guo C, Wang H, Xiao L. AdaDNNs: Adaptive ensemble of deep neural networks for scene text recognition. arXiv: 1710.03425, 2017.
WojnaZGorbanANLeeDMurphyKPYuQLiYIbarzJAttention-based extraction of structured information from street view imagery201784485010.1109/ICDAR.2017.143
Wojna Z, Gorban AN, Lee D, Murphy KP, Yu Q, Li Y, Ibarz J. Attention-based extraction of structured information from street view imagery. In: Proc. of the Int'l Conf. on Document Analysis Recognition. 2017. 844-850. [doi: 10.1109/ICDAR.2017.143]
doi: 10.1109/CVPR.2016.245]]]>
GhoshSKValvenyEBagdanovADVisual attention models for scene text recognition201794394810.1109/icdar.2017.158
Ghosh SK, Valveny E, Bagdanov AD. Visual attention models for scene text recognition. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2017. 943-948. [doi: 10.1109/icdar.2017.158]
LiuWChenCWongKKSAFE: Scale aware feature encoder for scene text recognition201919621110.1007/978-3-030-20890-5_13
Liu W, Chen C, Wong KK. SAFE: Scale aware feature encoder for scene text recognition. In: Proc. of the Asian Conf. on Computer Vision. 2019. 196-211. [doi: 10.1007/978-3-030-20890-5_13]
doi: 10.1109/CVPR.2016.452]]]>
ShiBYangMWangXLyuPYaoCBaiXASTER: An attentional scene text recognizer with flexible rectification20194192035204810.1109/TPAMI.2018.2848939
Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. ASTER: An attentional scene text recognizer with flexible rectification. IEEE Trans. on Pattern Analysis Machine Intelligence, 2019, 41(9):, 2035-2048. [doi: 10.1109/TPAMI.2018.2848939]
Zhan F, Lu S. ESIR: End-to-end scene text recognition via iterative image rectification. arXiv: 1812.05824v3, 2018.
LuoCJinLSunZMORAN: A multi-object rectified attention network for scene text recognition2019901210911810.1016/j.patcog.2019.01.020
Luo C, Jin L, Sun Z. MORAN: A multi-object rectified attention network for scene text recognition. Pattern Recognition, 2019, 90(12):109-118. [doi: 10.1016/j.patcog.2019.01.020]
ChengZBaiFXuYZhengGPuSZhouSFocusing attention: Towards accurate text recognition in natural images20175086509410.1109/ICCV.2017.543
Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S. Focusing attention: Towards accurate text recognition in natural images. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 5086-5094. [doi: 10.1109/ICCV.2017.543]
WangTWuDJCoatesANgAYEnd-to-end text recognition with convolutional neural networks201233043308
Wang T, Wu DJ, Coates A, Ng AY. End-to-end text recognition with convolutional neural networks. In: Proc. of the Int'l Conf. on Pattern Recognition. 2012. 3304-3308.
doi: 10.1109/CVPR.2018.00527]]]>
BustaMNeumannLMatasJDeep TextSpotter: An end-to-end trainable scene text localization and recognition framework20172223223110.1109/ICCV.2017.242
Busta M, Neumann L, Matas J. Deep TextSpotter: An end-to-end trainable scene text localization and recognition framework. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 2223-2231. [doi: 10.1109/ICCV.2017.242]
LiHWangPShenCTowards end-to-end text spotting with convolutional recurrent neural networks20175248525610.1109/iccv.2017.560
Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proc. of the Int'l Conf. on Computer Vision. 2017. 5248-5256. [doi: 10.1109/iccv.2017.560]
doi: 10.1109/CVPR.2018.00595]]]>
doi: 10.1109/TPAMI.2019.2937086]]]>
GoodfellowIJBulatovYIbarzJArnoudSCShetVDMulti-digit number recognition from street view imagery using deep convolutional neural networks2014
Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud SC, Shet VD. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In: Proc. of the Int'l Conf. on Learning Representations. 2014.
doi: 10.1109/CVPR.2017.690]]]>
SuiWZhangQYangJChuWA novel integrated framework for learning both text detection and recognition20182233223810.1109/icpr.2018.8545047
Sui W, Zhang Q, Yang J, Chu W. A novel integrated framework for learning both text detection and recognition. In: Proc. of the Int'l Conf. on Pattern Recognition. 2018. 2233-2238. [doi: 10.1109/icpr.2018.8545047]
GehringJAuliMGrangierDYaratsDDauphinYNConvolutional sequence to sequence learning201712431252
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN. Convolutional sequence to sequence learning. In: Proc. of the Int'l Conf. on Machine Learning. 2017. 1243-1252.
BartzCYangHMeinelCSEE: Towards semi-supervised end-to-end scene text recognition201866746681
Bartz C, Yang H, Meinel C. SEE: Towards semi-supervised end-to-end scene text recognition. In: Proc. of the National Conf. on Artificial Intelligence. 2018. 6674-6681.
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R. ICDAR 2003 robust reading competitions. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2003. 105-122. [doi: 10.1109/ICDAR.2003.1227749]
Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LGI, Mestre SR, Mas J, Mota DF, Almazan J, Heras LDL. ICDAR 2013 robust reading competition. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2013. 1484-1493. [doi: 10.1109/ ICDAR.2013.221]
KaratzasDGomezbigordaLNicolaouAGhoshSKBagdanovADIwamuraMMatasJNeumannLChandrasekharVRLuSICDAR 2015 competition on robust reading20151156116010.1109/ICDAR.2015.7333942
Karatzas D, Gomezbigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S. ICDAR 2015 competition on robust reading. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2015. 1156-1160. [doi: 10.1109/ICDAR.2015.7333942]
YaoCBaiXLiuWMaYTuZDetecting texts of arbitrary orientations in natural images20121083109010.1109/CVPR.2012.6247787
Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2012. 1083-1090. [doi: 10.1109/CVPR.2012.6247787]
doi: 10.1109/ICDAR. 2017.237]]]>
GomezRShiBGomezLNumannLVeitAMatasJBelongieSJKaratzasDICDAR2017 robust reading challenge on COCO-text20171435144310.1109/ICDAR.2017.234
Gomez R, Shi B, Gomez L, Numann L, Veit A, Matas J, Belongie SJ, Karatzas D. ICDAR2017 robust reading challenge on COCO-text. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2017. 1435-1443. [doi: 10.1109/ICDAR.2017. 234]
doi: 10.1109/ICDAR.2017.233]]]>
Yuan T, Zhu Z, Xu K, Li C, Hu S. Chinese text in the wild. arXiv: 1803.00085v1, 2018.
Liu Y, Jin L, Zhang S, Zhang S. Detecting curve text in the wild: New dataset and new solution. arXiv: 1803.00085, 2017.
ChngCKChanCSTotal-text: A comprehensive dataset for scene text detection and recognition201793594210.1109/ICDAR.2017.157
Chng CK, Chan CS. Total-text: A comprehensive dataset for scene text detection and recognition. In: Proc. of the Int'l Conf. on Document Analysis and Recognition. 2017. 935-942. [doi: 10.1109/ICDAR.2017.157]
WolfCJolionJMObject count/area graphs for the evaluation of object detection and segmentation algorithms20068428029610.1007/s10032-006-0014-0
Wolf C, Jolion JM. Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int'l Journal on Document Analysis, 2006, 8(4):280-296. [doi: 10.1007/s10032-006-0014-0]
JaderbergMSimonyanKVedaldiAZissermanAReading text in the wild with convolutional neural networks2016116112010.1007/s11263-015-0823-z
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional neural networks. Int'l Journal of Computer Vision, 2016, 116(1):1-20. [doi: 10.1007/s11263-015-0823-z]
He K, Gkioxari G, Dollar P, Girshick RB. Mask R-CNN. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397. [doi: 10.1109/TPAMI.2018.2844175]
WangTJiangJHText recognition in any direction based on semantic segmentation2018453596410.11991/yykj.201705006
Wang T, Jiang JH. Text recognition in any direction based on semantic segmentation. Applied Science and Technology, 2018, 45(3):59-64(in Chinese with English abstract). http://kns.cnki.net/kcms/detail/23.1191.U.20170704.1807.006.html [doi: 10.11991/yykj.201705006]
Zhan F, Xue C, Lu S. GA-DAN: Geometry-aware domain adaptation network for scene text detection and recognition. In: Proc. of the IEEE Int'l Conf. on Computer Vision. 2019.