基于深度学习的二维人体姿态估计综述

doi:10.13328/j.cnki.jos.006390

微信服务号

微信订阅号

2025年4月17日 1:39 星期四

首页 > 过刊浏览>2022年第33卷第11期 >4173-4191. DOI:10.13328/j.cnki.jos.006390

PDF HTML阅读 XML下载导出引用引用提醒

基于深度学习的二维人体姿态估计综述
DOI:
                        10.13328/j.cnki.jos.006390
                    
CSTR:
                        
                    
作者:
                        张宇张宇
东南大学 计算机科学与工程学院, 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
温光照温光照
东南大学 计算机科学与工程学院, 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
米思娅米思娅
东南大学 网络空间安全学院, 江苏 南京 211189;网络通信与安全紫金山实验室, 江苏 南京 211111
在期刊界中查找
在百度中查找
在本站中查找
张敏灵张敏灵
东南大学 计算机科学与工程学院, 江苏 南京 211189;东南大学 网络空间安全学院, 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
耿新耿新
东南大学 计算机科学与工程学院, 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张宇(1986-),男,博士,副教授,CCF专业会员,主要研究领域为计算机视觉,机器学习,深度学习;张敏灵(1979-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为机器学习,数据挖掘;温光照(1997-),男,硕士,主要研究领域为计算机视觉;耿新(1978-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为机器学习,模式识别,计算机视觉;米思娅(1988-),女,博士,讲师,主要研究领域为用于网络安全的数据处理,计算机视觉.
通讯作者:米思娅,E-mail:SiyaMi@seu.edu.cn
中图分类号:
基金项目:国家重点研发计划（2018AAA0100100）；国家自然科学基金（61702095）；江苏省自然科学基金（BK20190341）

Overview on 2D Human Pose Estimation Based on Deep Learning

Author:

ZHANG Yu
ZHANG Yu
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
WEN Guang-Zhao
WEN Guang-Zhao
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
MI Si-Ya
MI Si-Ya
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China;Purple Mountain Laboratory, Nanjing 211111, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Min-Ling
ZHANG Min-Ling
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
GENG Xin
GENG Xin
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [57]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

人体姿态估计是计算机视觉领域的一个基础且具有挑战的任务，人体姿态估计对于描述人体姿态、描述人体行为等至关重要，是行为识别、行为检测等计算机视觉任务的基础.近年来，随着深度学习的发展，基于深度学习的人体姿态估计算法展现出了极其优异的效果.从单人人体姿态估计、自顶向下的多人人体姿态估计和自底向上的多人人体姿态估计这3种主流的人体姿态估计方式，介绍近年来基于深度学习的二维人体姿态估计算法的发展，并讨论目前二维人体姿态估计所面临的困难和挑战.最后，对人体姿态估计未来的发展做出展望.

关键词:深度学习;二维人体姿态估计;关键点检测

Abstract:

Human pose estimation is a basic and challenging task in the field of computer vision. It is the basis for many of computer vision tasks, such as action recognition and action detection. With the development of deep learning methods, deep learning-based human pose estimation algorithms have shown excellent results. This study divides pose estimation methods into three categories, including single person pose estimation, top-down multi-person pose estimation, and bottom-up multi-person pose estimation. The development of 2D human pose estimation algorithms in recent years is introduced, and the current challenges of two-dimensional human pose estimation are discussed. Finally, the outlook for the future development of human pose estimation is given.

Key words:deep learning;2D human pose estimation;keypoint detection

参考文献

[1] Zhao X, Liu Y, Fu Y. Exploring discriminative pose sub-patterns for effective action classification. In:Proc. of the ACM Multimedia. Barcelona:ACM, 2013. 273-282.[doi:10.1145/2502081.2502094]

[2] Wang L, Wu J, Zhou ZM, Zhao X, Liu YC. Human action recognition through part-configured human detection response feature maps. Ruan Jian Xue Bao/Journal of Software, 2015, 26:128-136(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/15023.htm

[3] Desai C, Ramanan D. Detecting actions, poses, and objects with relational phraselets. In:Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, eds. Proc. of the European Conf. on Computer Vision. Florence:Springer, 2012. 158-172.

[4] Zhang QJ, Zhang L. Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12(6):1140-1148.

[5] Huang LL, Peng JF, Zhang RM, et al. Learning deep representations for semantic image parsing:A comprehensive overview. Frontiers of Computer Science, 2018, 12(5):840-857.

[6] Toshev A, Szegedy C. Deeppose:Human pose estimation via deep neural networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2014.

[7] Andriluka M, Pishchulin L, Gehler PV, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2014. 3686-3693.

[8] Lin TY, Maire M, Belongie S, et al. Microsoft coco:Common objects in context. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2014. 740-755.

[9] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6):84-90.

[10] Tompson JJ, Jain A, LeCun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation. Advances in Neural Information Processing Systems, 2014, 27:1799-1807.

[11] Wei SE, Ramakrishna V, Kanade T, et al. Convolutional pose machines. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4724-4732.

[12] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2016. 483-499.

[13] Chu X, Yang W, Ouyang W, et al. Multi-context attention for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 1831-1840.

[14] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 5693-5703.

[15] Artacho B, Savakis A. UniPose:Unified human pose estimation in single images and videos. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 7035-7044.

[16] Artacho B, Savakis A. Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors, 2019, 19(24):5361.

[17] Chu X, Ouyang W, Li H, et al. Structured feature learning for pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4715-4723.

[18] Ke L, Chang MC, Qi H, et al. Multi-scale structure-aware network for human pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 713-728.

[19] Zhang H, Ouyang H, Liu S, et al. Human pose estimation with spatial contextual information. arXiv:1901.01760, 2019.

[20] Bulat A, Kossaifi J, Tzimiropoulos G, et al. Toward fast and accurate human pose estimation via soft-gated skip connections. arXiv:2002.11098, 2020.

[21] Nie X, Feng J, Zuo Y, et al. Human pose estimation with parsing induced learner. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 2100-2108.

[22] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 502-517.

[23] Bin Y, Cao X, Chen X, et al. Adversarial semantic data augmentation for human pose estimation. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2020. 606-622.

[24] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, 27:2672-2680.

[25] Zhang F, Zhu X, Dai H, et al. Distribution-aware coordinate representation for human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 7093-7102.

[26] Huang J, Zhu Z, Guo F, et al. The devil is in the details:Delving into unbiased data processing for human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 5700-5709.

[27] Ge DH, Li HS, Zhang L, Liu RY, Shen PY, Miao QG. Survey of lightweight neural network. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9):2627-2653(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j. cnki.jos.005942]

[28] Zhang ZK, Pang WG, Xie WJ, Lü MS, Wang Y. Deep learning for real-time applications:A survey. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9):2654-2677(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5946.htm[doi:10.13328/j. cnki.jos.005946]

[29] Zhang F, Zhu X, Ye M. Fast human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 3517-3526.

[30] Chen Y, Wang Z, Peng Y, et al. Cascaded pyramid network for multi-person pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 7103-7112.

[31] Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 466-481.

[32] Ren S, He K, Girshick R, et al. Faster r-CNN:Towards real-time object detection with region proposal networks. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2016, 39(6):1137-1149.

[33] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 770-778.

[34] Su K, Yu D, Xu Z, et al. Multi-person pose estimation with enhanced channel-wise and spatial information. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 5674-5682.

[35] Zhang X, Zhou X, Lin M, et al. ShuffleNet:An extremely efficient convolutional neural network for mobile devices. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6848-6856.

[36] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 7132-7141.

[37] Chen L, Zhang H, Xiao J, et al. SCA-CNN:Spatial and channel-wise attention in convolutional networks for image captioning. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 5659-5667.

[38] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2017. 2961-2969.

[39] Fang HS, Xie S, Tai YW, et al. RMPE:Regional multi-person pose estimation. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2017. 2334-2343.

[40] Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 4903-4911.

[41] Wang M, Tighe J, Modolo D. Combining detection and tracking for human pose estimation in videos. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 11088-11096.

[42] Li J, Wang C, Zhu H, et al. CrowdPose:Efficient crowded scenes pose estimation and a new benchmark. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 10863-10872.

[43] Qiu L, Zhang X, Li Y, et al. Peeking into occluded joints:A novel framework for crowd pose estimation. arXiv:2003.10506, 2020.

[44] Pishchulin L, Insafutdinov E, Tang S, et al. DeepCut:Joint subset partition and labeling for multi person pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4929-4937.

[45] Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut:A deeper, stronger, and faster multi-person pose estimation model. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2016. 34-50.

[46] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 7291-7299.

[47] Newell A, Huang Z, Deng J. Associative embedding:End-to-end learning for joint detection and grouping. In:Proc. of the Advances in Neural Information Processing Systems. 2017. 2277-2287.

[48] Kocabas M, Karagoz S, Akbas E. MultiPoseNet:Fast multi-person pose estimation using pose residual network. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 417-433.

[49] Papandreou G, Zhu T, Chen LC, et al. PersonLab:Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 269-286.

[50] Kreiss S, Bertoni L, Alahi A. Pifpaf:Composite fields for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 11977-11986.

[51] Cheng B, Xiao B, Wang J, et al. HigherHRNet:Scale-aware representation learning for bottom-up human pose estimation. arXiv:1908.10357, 2019.

[52] Nie X, Feng J, Xing J, et al. Pose partition networks for multi-person pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 684-699.

[53] Varamesh A, Tuytelaars T. Mixture dense regression for object detection and human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 13086-13095.

附中文参考文献:

[2] 王磊,吴俊, 周志敏, 赵旭, 刘允才. 人体检测部分响应特征映射的人体动作识别. 软件学报, 2015, 26:128-136. http://www. jos.org.cn/1000-9825/15023.htm

[27] 葛道辉, 李洪升, 张亮, 刘如意, 沈沛意, 苗启广. 轻量级神经网络架构综述. 软件学报, 2020, 31(9):2627-2653. http://www. jos.org.cn/1000-9825/5942.htm[doi:10.13328/j.cnki.jos.005942]

[28] 张政馗, 庞为光, 谢文静, 吕鸣松, 王义. 面向实时应用的深度学习研究综述. 软件学报, 2020, 31(9):2654-2677. http://www. jos.org.cn/1000-9825/5946.htm[doi:10.13328/j.cnki.jos.005946]

引用本文

张宇,温光照,米思娅,张敏灵,耿新.基于深度学习的二维人体姿态估计综述.软件学报,2022,33(11):4173-4191

复制

文章指标

点击次数:3438
下载次数: 13236
HTML阅读次数: 6303
引用次数: 0

历史

收稿日期:2020-01-18
最后修改日期:2021-01-06
录用日期:
在线发布日期: 2021-08-02
出版日期: 2022-11-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码