多尺度目标检测的深度学习研究综述

doi:10.13328/j.cnki.jos.006166

微信服务号

微信订阅号

2025年4月2日 12:20 星期三

首页 > 过刊浏览>2021年第32卷第4期 >1201-1227. DOI:10.13328/j.cnki.jos.006166

PDF HTML阅读 XML下载导出引用引用提醒

多尺度目标检测的深度学习研究综述
DOI:
                        10.13328/j.cnki.jos.006166
                    
CSTR:
                        
                    
作者:
                        陈科圻陈科圻
中国科学院大学 计算机科学与技术学院, 北京 100190;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190;人机交互北京市重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
朱志亮朱志亮
计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190;人机交互北京市重点实验室(中国科学院 软件研究所), 北京 100190;华东交通大学 软件学院, 江西 南昌 330013
在期刊界中查找
在百度中查找
在本站中查找
邓小明邓小明
计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190;人机交互北京市重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
马翠霞马翠霞
中国科学院大学 计算机科学与技术学院, 北京 100190;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190;人机交互北京市重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
王宏安王宏安
中国科学院大学 计算机科学与技术学院, 北京 100190;计算机科学国家重点实验室(中国科学院 软件研究所), 北京 100190;人机交互北京市重点实验室(中国科学院 软件研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:陈科圻(1997-),男,硕士生,主要研究领域为计算机视觉.
朱志亮(1988-),男,博士,讲师,主要研究领域为图像智能感知与增强,人机交互.
邓小明(1980-),男,博士,副研究员,CCF高级会员,主要研究领域为计算机视觉,人机交互.
马翠霞(1975-),女,博士,研究员,博士生导师,CCF高级会员,主要研究领域为人机交互,媒体大数据可视分析.
王宏安(1963-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为自然人机交互,实时智能计算.
通讯作者:马翠霞,E-mail:cuixia@iscas.ac.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB1001200）；国家自然科学基金（61872346）

Deep Learning for Multi-scale Object Detection: A Survey

Author:

CHEN Ke-Qi
CHEN Ke-Qi
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;Beijing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Zhi-Liang
ZHU Zhi-Liang
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;Beijing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;School of Software, East China Jiaotong University, Nanchang 330013, China
在期刊界中查找
在百度中查找
在本站中查找
DENG Xiao-Ming
DENG Xiao-Ming
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;Beijing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
MA Cui-Xia
MA Cui-Xia
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;Beijing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Hong-An
WANG Hong-An
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;Beijing Key Laboratory of Human-computer Interaction (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB1001200); National Natural Science Foundation of China (61872346)

摘要

图/表

访问统计

参考文献 [101]

相似文献

引证文献

资源附件

文章评论

摘要:

目标检测一直以来都是计算机视觉领域的研究热点之一，其任务是返回给定图像中的单个或多个特定目标的类别与矩形包围框坐标.随着神经网络研究的飞速进展，R-CNN检测器的诞生标志着目标检测正式进入深度学习时代，速度和精度相较于传统算法均有了极大的提升.但是，目标检测的尺度问题对于深度学习算法而言也始终是一个难题，即检测器对于尺度极大或极小目标的检测精度会显著下降，因此，近年来有不少学者在研究如何才能更好地实现多尺度目标检测.虽然已有一系列的综述文章从算法流程、网络结构、训练方式和数据集等方面对基于深度学习的目标检测算法进行了总结与分析，但对多尺度目标检测的归纳和整理却鲜有人涉足.因此，首先对基于深度学习的目标检测的两个主要算法流派的奠基过程进行了回顾，包括以R-CNN系列为代表的两阶段算法和以YOLO、SSD为代表的一阶段算法；然后，以多尺度目标检测的实现为核心，重点诠释了图像金字塔、构建网络内的特征金字塔等典型策略；最后，对多尺度目标检测的现状进行总结，并针对未来的研究方向进行展望.

关键词:目标检测;深度学习;尺度问题;多尺度特征

Abstract:

Object detection is a classic computer vision task which aims to detect multiple objects of certain classes within a given image by bounding-box-level localization. With the rapid development of neural network technology and the birth of R-CNN detector as a milestone, a series of deep-learning-based object detectors have been developed in recent years, showing the overwhelming speed and accuracy advantage against traditional algorithms. However, how to precisely detect objects in large scale variance, also known as the scale problem, still remains a great challenge even for the deep learning methods, while many scholars have made several contributions to it over the last few years. Although there are already dozens of surveys focusing on the summarization of deep-learning-based object detectors in several aspects including algorithm procedure, network structure, training and datasets, very few of them concentrate on the methods of multi-scale object detection. Therefore, this paper firstly review the foundation of the deep-learning-based detectors in two main streams, including the two-stage detectors like R-CNN and one-stage detectors like YOLO and SSD. Then, the effective approaches are discussed to address the scale problems including most commonly used image pyramids, in-network feature pyramids, etc. At last, the current situations of the multi-scale object detection are concluded and the future research directions are looked ahead.

Key words:object detection;deep learning;scale problem;multi-scale feature

参考文献

[1] Lowe DG. Distinctive image features from scale-invariant keypoints. Int'l Journal of Computer Vision, 2004,60(2):91-110.

[2] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In:Proc. of the Computer Vision and Pattern Recognition. 2005,1:886-893.

[3] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:Proc. of the Neural Information Processing Systems. 2012. 1097-1105.

[4] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. Imagenet:A large-scale hierarchical image database. In:Proc. of the Computer Vision and Pattern Recognition. 2009. 248-255.

[5] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv:1409.1556, 2014.

[6] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proc. of the Computer Vision and Pattern Recognition. 2015. 1-9.

[7] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:Proc. of the Computer Vision and Pattern Recognition. 2016. 770-778.

[8] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proc. of the Computer Vision and Pattern Recognition. 2014. 580-587.

[9] Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The pascal visual object classes (VoC) challenge. Int'l Journal of Computer Vision, 2010,88(2):303-338.

[10] Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009,32(9):1627-1645.

[11] Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R. A survey of deep learning-based object detection. IEEE Access, 2019,7:128837-128868.

[12] Wu X, Sahoo D, Hoi SCH. Recent advances in deep learning for object detection. Neurocomputing, 2020.

[13] Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M. Deep learning for generic object detection:A survey. Int'l Journal of Computer Vision, 2020,128(2):261-318.

[14] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once:Unified, real-time object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2016. 779-788.

[15] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. SSD:Single shot multibox detector. In:Proc. of the European Conf. on Computer Vision. 2016. 21-37.

[16] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft coco:Common objects in context. In:Proc. of the European Conf. on Computer Vision. 2014. 740-755.

[17] Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM. Selective search for object recognition. Int'l Journal of Computer Vision, 2013,104(2):154-171.

[18] He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015,37(9):1904-1916.

[19] Girshick R. Fast R-CNN. In:Proc. of the Int'l Conf. on Computer Vision. 2015. 1440-1448.

[20] Zitnick CL, Dollár P. Edge boxes:Locating object proposals from edges. In:Proc. of the European Conf. on Computer Vision. 2014. 391-405.

[21] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In:Proc. of the Neural Information Processing Systems. 2015. 91-99.

[22] Dai J, Li Y, He K, Sun J. R-FCN:Object detection via region-based fully convolutional networks. In:Proc. of the Neural Information Processing Systems. 2016. 379-387.

[23] Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 2117-2125.

[24] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In:Proc. of the Int'l Conf. on Computer Vision. 2017. 2961-2969.

[25] Qin Z, Li Z, Zhang Z, Bao Y, Yu G, Peng Y, Sun J. ThunderNet:Towards real-time generic object detection on mobile devices. In:Proc. of the Int'l Conf. on Computer Vision. 2019. 6718-6727.

[26] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat:Integrated recognition, localization and detection using convolutional networks. arXiv Preprint arXiv:1312.6229, 2013.

[27] Redmon J, Farhadi A. YOLO9000:Better, faster, stronger. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 7263-7271.

[28] Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In:Proc. of the Int'l Conf. on Computer Vision. 2017. 2980-2988.

[29] Redmon J, Farhadi A. Yolov3:An incremental improvement. arXiv Preprint arXiv:1804.02767, 2018.

[30] Zhu C, He Y, Savvides M. Feature selective anchor-free module for single-shot object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 840-849.

[31] Wang J, Chen K, Yang S, Loy CC, Lin D. Region proposal by guided anchoring. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 2965-2974.

[32] Tian Z, Shen C, Chen H, He T. FCOS:Fully convolutional one-stage object detection. In:Proc. of the Int'l Conf. on Computer Vision. 2019. 9627-9636.

[33] Law H, Deng J. Cornernet:Detecting objects as paired keypoints. In:Proc. of the European Conf. on Computer Vision. 2018. 734-750.

[34] Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv Preprint arXiv:1904.07850, 2019.

[35] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In:Proc. of the European Conf. on Computer Vision. 2016. 483-499.

[36] Fu CY, Liu W, Ranga A, Tyagi A, Berg AC. DSSD:Deconvolutional single shot detector. arXiv Preprint arXiv:1701.06659, 2017.

[37] Cai Z, Vasconcelos N. Cascade R-CNN:Delving into high quality object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 6154-6162.

[38] Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 8759-8768.

[39] Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 1492-1500.

[40] Singh B, Davis LS. An analysis of scale invariance in object detection snip. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 3578-3587.

[41] Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In:Proc. of the Int'l Conference on Computer Vision. 2017. 764-773.

[42] Singh B, Najibi M, Davis LS. SNIPER:Efficient multi-scale training. In:Proc. of the Neural Information Processing Systems. 2018. 9310-9320.

[43] Liu S, Huang D. Receptive field block net for accurate and fast object detection. In:Proc. of the European Conf. on Computer Vision. 2018. 385-400.

[44] Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ. Parallel feature pyramid network for object detection. In:Proc. of the European Conf. on Computer Vision. 2018. 234-250.

[45] Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J. Detnet:A backbone network for object detection. arXiv Preprint arXiv:1804.06215, 2018.

[46] Bai Y, Zhang Y, Ding M, Ghanem B. SOD-MTGAN:Small object detection via multi-task generative adversarial network. In:Proc. of the European Conf. on Computer Vision. 2018. 206-221.

[47] Zhou P, Ni B, Geng C, Hu J, Xu Y. Scale-transferrable object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 528-537.

[48] Zhang Z, Qiao S, Xie C, Shen W, Wang Bo, Yuille AL. Single-shot object detection with enriched semantics. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 5813-5821.

[49] Zhu X, Hu H, Lin S, Dai J. Deformable convnets v2:More deformable, better results. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 9308-9316.

[50] Lu X, Li B, Yue Y, Li Q, Yan J. Grid R-CNN. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 7363-7372.

[51] Li Y, Chen Y, Wang N, Zhang Z. Scale-aware trident networks for object detection. In:Proc. of the Int'l Conf. on Computer Vision. 2019. 6054-6063.

[52] Liu S, Huang D, Wang Y. Learning spatial fusion for single-shot object detection. arXiv Preprint arXiv:1911.09516, 2019.

[53] Song G, Liu Y, Wang X. Revisiting the sibling head in object detector. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 11563-11572.

[54] Guo C, Fan B, Zhang Q, Xiang S, Pan C. AUGFPN:Improving multi-scale feature learning for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 12595-12604.

[55] Zhang S, Chi C, Yao Y, Lei Z, Li SZ. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 9759-9768.

[56] Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, Fu Y. Rethinking classification and localization for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 10186-10195.

[57] Cao J, Cholakkal H, Anwer RM, Khan FS, Peng Y, Shao L. D2Det:Towards high quality object detection and instance segmentation. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 11485-11494.

[58] Zhang H, Chang H, Ma B, Wang N, Chen X. Dynamic R-CNN:Towards high quality object detection via dynamic training. arXiv Preprint arXiv:2004.06002, 2020.

[59] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4:Optimal speed and accuracy of object detection. arXiv Preprint arXiv:2004. 10934, 2020.

[60] Hao Z, Liu Y, Qin H, Yan J, Li X, Hu X. Scale-aware face detection. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 6186-6195.

[61] Jain V, Learned-Miller E. FDDB:A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report, 2010,2(4).

[62] Zhu X, Ramanan D. Face detection, pose estimation, and landmark localization in the wild. In:Proc. of the Computer Vision and Pattern Recognition. 2012. 2879-2886.

[63] Yang B, Yan J, Lei Z, Li SZ. Fine-grained evaluation on face detection in the wild. In:Proc. of the Int'l Conf. and Workshops on Automatic Face and Gesture Recognition. 2015,1:1-7.

[64] Lu Y, Javidi T, Lazebnik S. Adaptive object detection using adjacency and zoom prediction. In:Proc. of the Computer Vision and Pattern Recognition. 2016. 2351-2359.

[65] Gao M, Yu R, Li A, Morariu VI, Davis LS. Dynamic zoom-in network for fast object detection in large images. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 6926-6935.

[66] Dollár P, Wojek C, Schiele B, Perona P. Pedestrian detection:A benchmark. In:Proc. of the Computer Vision and Pattern Recognition. IEEE, 2009. 304-311.

[67] Kalkowski S, Schulze C, Dengel A, Borth D. Real-time analysis and visualization of the YFCC100M dataset. In Proc. of the Workshop on Community-Organized Multimodal Mining:Opportunities for Novel Solutions. 2015. 25-30.

[68] Uzkent B, Yeh C, Ermon S. Efficient object detection in large images using deep reinforcement learning. In:Proc. of the Winter Conf. on Applications of Computer Vision. 2020. 1824-1833.

[69] Lam D, Kuzma R, McGee K, Dooley S, Laielli M, Klaric M, Bulatov Y, McCord B. xview:Objects in context in overhead imagery. arXiv Preprint arXiv:1802.07856, 2018.

[70] Cai Z, Fan Q, Feris RS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In:Proc. of the European Conf on Computer Vision. 2016. 354-370.

[71] Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In:Proc. of the European Conf. on Computer Vision. 2018. 169-185.

[72] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 7132-7141.

[73] Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D. Libra R-CNN:Towards balanced learning for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 821-830.

[74] Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In:Proc. of the Computer Vision and Pattern Recognition. 2018. 7794-7803.

[75] Tan M, Pang R, Le QV. Efficientdet:Scalable and efficient object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 10781-10790.

[76] Tan M, Le QV. Efficientnet:Rethinking model scaling for convolutional neural networks. arXiv Preprint arXiv:1905.11946, 2019.

[77] Ghiasi G, Lin TY, Le QV. NAS-FPN:Learning scalable feature pyramid architecture for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 7036-7045.

[78] Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, Zhang Y. NAS-FCOS:Fast neural architecture search for object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 11943-11951.

[79] Lazebnik S, Schmid C, Ponce J. Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories. In:Proc. of the Computer Vision and Pattern Recognition. 2006,2:2169-2178.

[80] Sivic J, Zisserman A. Video Google:A text retrieval approach to object matching in videos. In:Proc. of the Int'l Conf. on Computer Vision. 2003. 1470-1478.

[81] Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv Preprint arXiv:1412.7062, 2014.

[82] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv Preprint arXiv:1511.07122, 2015.

[83] Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv Preprint arXiv:1706.05587, 2017.

[84] Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 2881-2890.

[85] Ming X, Wei F, Zhang T, Chen D, Wen F. Group sampling for scale invariant face detection. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 3446-3456.

[86] Yang S, Luo P, Loy CC, Tang X. Wider face:A face detection benchmark. In:Proc. of the Computer Vision and Pattern Recognition. 2016. 5525-5533.

[87] Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D. Multiple anchor learning for visual object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 10206-10215.

[88] Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z. Dynamic convolution:Attention over convolution kernels. In:Proc. of the Computer Vision and Pattern Recognition. 2020. 11030-11039.

[89] Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H. Searching for mobilenetv3. In:Proc. of the Int'l Conf. on Computer Vision. 2019. 1314-1324.

[90] Yu J, Jiang Y, Wang Z, Cao Z, Huang T. Unitbox:An advanced object detection network. In:Proc. of the ACM Int'l Conf. on Multimedia. 2016. 516-520.

[91] Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S. Generalized intersection over union:A metric and a loss for bounding box regression. In:Proc. of the Computer Vision and Pattern Recognition. 2019. 658-666.

[92] Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D. Distance-IoU loss:Faster and better learning for bounding box regression. In:Proc. of the American Association for Artificial Intelligence. 2020. 12993-13000.

[93] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 4700-4708.

[94] Li J, Liang X, Wei Y, Xu T, Feng J, Yan S. Perceptual generative adversarial networks for small object detection. In:Proc. of the Computer Vision and Pattern Recognition. 2017. 1222-1230.

[95] Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Traffic-sign detection and classification in the wild. In:Proc. of the Computer Vision and Pattern Recognition. 2016. 2110-2118.

[96] Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K. Augmentation for small object detection. arXiv Preprint arXiv:1902.07296, 2019.

[97] Yu X, Gong Y, Jiang N, Ye Q, Han Z. Scale match for tiny person detection. In:Proc. of the Winter Conf. on Applications of Computer Vision. 2020. 1257-1265.

[98] Chen Y, Zhang P, Li Z, Li Y, Zhang X, Meng G, Xiang S, Sun J, Jia J. Stitcher:Feedback-driven data provider for object detection. arXiv Preprint arXiv:2004.12432, 2020.

[99] Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y. Cutmix:Regularization strategy to train strong classifiers with localizable features. In:Proc. of the Int'l Conf. on Computer Vision. 2019. 6023-6032.

[100] Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J. DetNAS:Backbone search for object detection. In:Proc. of the Neural Information Processing Systems. 2019. 6638-6648.

[101] Huang L, Yang Y, Deng Y, Yu Y. Densebox:Unifying landmark localization with end to end object detection. arXiv Preprint arXiv:1509.04874, 2015.

引用本文

陈科圻,朱志亮,邓小明,马翠霞,王宏安.多尺度目标检测的深度学习研究综述.软件学报,2021,32(4):1201-1227

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-08-10
最后修改日期:2020-09-20
录用日期:
在线发布日期: 2020-12-02
出版日期: 2021-04-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码