基于深度学习的二维人体姿态估计综述
作者:
作者简介:

张宇(1986-),男,博士,副教授,CCF专业会员,主要研究领域为计算机视觉,机器学习,深度学习;张敏灵(1979-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为机器学习,数据挖掘;温光照(1997-),男,硕士,主要研究领域为计算机视觉;耿新(1978-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为机器学习,模式识别,计算机视觉;米思娅(1988-),女,博士,讲师,主要研究领域为用于网络安全的数据处理,计算机视觉.

通讯作者:

米思娅,E-mail:SiyaMi@seu.edu.cn

基金项目:

国家重点研发计划(2018AAA0100100);国家自然科学基金(61702095);江苏省自然科学基金(BK20190341)


Overview on 2D Human Pose Estimation Based on Deep Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [57]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    人体姿态估计是计算机视觉领域的一个基础且具有挑战的任务,人体姿态估计对于描述人体姿态、描述人体行为等至关重要,是行为识别、行为检测等计算机视觉任务的基础.近年来,随着深度学习的发展,基于深度学习的人体姿态估计算法展现出了极其优异的效果.从单人人体姿态估计、自顶向下的多人人体姿态估计和自底向上的多人人体姿态估计这3种主流的人体姿态估计方式,介绍近年来基于深度学习的二维人体姿态估计算法的发展,并讨论目前二维人体姿态估计所面临的困难和挑战.最后,对人体姿态估计未来的发展做出展望.

    Abstract:

    Human pose estimation is a basic and challenging task in the field of computer vision. It is the basis for many of computer vision tasks, such as action recognition and action detection. With the development of deep learning methods, deep learning-based human pose estimation algorithms have shown excellent results. This study divides pose estimation methods into three categories, including single person pose estimation, top-down multi-person pose estimation, and bottom-up multi-person pose estimation. The development of 2D human pose estimation algorithms in recent years is introduced, and the current challenges of two-dimensional human pose estimation are discussed. Finally, the outlook for the future development of human pose estimation is given.

    参考文献
    [1] Zhao X, Liu Y, Fu Y. Exploring discriminative pose sub-patterns for effective action classification. In:Proc. of the ACM Multimedia. Barcelona:ACM, 2013. 273-282.[doi:10.1145/2502081.2502094]
    [2] Wang L, Wu J, Zhou ZM, Zhao X, Liu YC. Human action recognition through part-configured human detection response feature maps. Ruan Jian Xue Bao/Journal of Software, 2015, 26:128-136(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/15023.htm
    [3] Desai C, Ramanan D. Detecting actions, poses, and objects with relational phraselets. In:Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, eds. Proc. of the European Conf. on Computer Vision. Florence:Springer, 2012. 158-172.
    [4] Zhang QJ, Zhang L. Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12(6):1140-1148.
    [5] Huang LL, Peng JF, Zhang RM, et al. Learning deep representations for semantic image parsing:A comprehensive overview. Frontiers of Computer Science, 2018, 12(5):840-857.
    [6] Toshev A, Szegedy C. Deeppose:Human pose estimation via deep neural networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2014.
    [7] Andriluka M, Pishchulin L, Gehler PV, Schiele B. 2D human pose estimation:New benchmark and state of the art analysis. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2014. 3686-3693.
    [8] Lin TY, Maire M, Belongie S, et al. Microsoft coco:Common objects in context. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2014. 740-755.
    [9] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6):84-90.
    [10] Tompson JJ, Jain A, LeCun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation. Advances in Neural Information Processing Systems, 2014, 27:1799-1807.
    [11] Wei SE, Ramakrishna V, Kanade T, et al. Convolutional pose machines. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4724-4732.
    [12] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2016. 483-499.
    [13] Chu X, Yang W, Ouyang W, et al. Multi-context attention for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 1831-1840.
    [14] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 5693-5703.
    [15] Artacho B, Savakis A. UniPose:Unified human pose estimation in single images and videos. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 7035-7044.
    [16] Artacho B, Savakis A. Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors, 2019, 19(24):5361.
    [17] Chu X, Ouyang W, Li H, et al. Structured feature learning for pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4715-4723.
    [18] Ke L, Chang MC, Qi H, et al. Multi-scale structure-aware network for human pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 713-728.
    [19] Zhang H, Ouyang H, Liu S, et al. Human pose estimation with spatial contextual information. arXiv:1901.01760, 2019.
    [20] Bulat A, Kossaifi J, Tzimiropoulos G, et al. Toward fast and accurate human pose estimation via soft-gated skip connections. arXiv:2002.11098, 2020.
    [21] Nie X, Feng J, Zuo Y, et al. Human pose estimation with parsing induced learner. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 2100-2108.
    [22] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 502-517.
    [23] Bin Y, Cao X, Chen X, et al. Adversarial semantic data augmentation for human pose estimation. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2020. 606-622.
    [24] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, 27:2672-2680.
    [25] Zhang F, Zhu X, Dai H, et al. Distribution-aware coordinate representation for human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 7093-7102.
    [26] Huang J, Zhu Z, Guo F, et al. The devil is in the details:Delving into unbiased data processing for human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 5700-5709.
    [27] Ge DH, Li HS, Zhang L, Liu RY, Shen PY, Miao QG. Survey of lightweight neural network. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9):2627-2653(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j. cnki.jos.005942]
    [28] Zhang ZK, Pang WG, Xie WJ, Lü MS, Wang Y. Deep learning for real-time applications:A survey. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9):2654-2677(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5946.htm[doi:10.13328/j. cnki.jos.005946]
    [29] Zhang F, Zhu X, Ye M. Fast human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 3517-3526.
    [30] Chen Y, Wang Z, Peng Y, et al. Cascaded pyramid network for multi-person pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 7103-7112.
    [31] Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 466-481.
    [32] Ren S, He K, Girshick R, et al. Faster r-CNN:Towards real-time object detection with region proposal networks. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2016, 39(6):1137-1149.
    [33] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 770-778.
    [34] Su K, Yu D, Xu Z, et al. Multi-person pose estimation with enhanced channel-wise and spatial information. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 5674-5682.
    [35] Zhang X, Zhou X, Lin M, et al. ShuffleNet:An extremely efficient convolutional neural network for mobile devices. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6848-6856.
    [36] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 7132-7141.
    [37] Chen L, Zhang H, Xiao J, et al. SCA-CNN:Spatial and channel-wise attention in convolutional networks for image captioning. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 5659-5667.
    [38] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2017. 2961-2969.
    [39] Fang HS, Xie S, Tai YW, et al. RMPE:Regional multi-person pose estimation. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2017. 2334-2343.
    [40] Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 4903-4911.
    [41] Wang M, Tighe J, Modolo D. Combining detection and tracking for human pose estimation in videos. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 11088-11096.
    [42] Li J, Wang C, Zhu H, et al. CrowdPose:Efficient crowded scenes pose estimation and a new benchmark. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 10863-10872.
    [43] Qiu L, Zhang X, Li Y, et al. Peeking into occluded joints:A novel framework for crowd pose estimation. arXiv:2003.10506, 2020.
    [44] Pishchulin L, Insafutdinov E, Tang S, et al. DeepCut:Joint subset partition and labeling for multi person pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 4929-4937.
    [45] Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut:A deeper, stronger, and faster multi-person pose estimation model. In:Proc. of the European Conf. on Computer Vision. Cham:Springer, 2016. 34-50.
    [46] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 7291-7299.
    [47] Newell A, Huang Z, Deng J. Associative embedding:End-to-end learning for joint detection and grouping. In:Proc. of the Advances in Neural Information Processing Systems. 2017. 2277-2287.
    [48] Kocabas M, Karagoz S, Akbas E. MultiPoseNet:Fast multi-person pose estimation using pose residual network. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 417-433.
    [49] Papandreou G, Zhu T, Chen LC, et al. PersonLab:Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 269-286.
    [50] Kreiss S, Bertoni L, Alahi A. Pifpaf:Composite fields for human pose estimation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 11977-11986.
    [51] Cheng B, Xiao B, Wang J, et al. HigherHRNet:Scale-aware representation learning for bottom-up human pose estimation. arXiv:1908.10357, 2019.
    [52] Nie X, Feng J, Xing J, et al. Pose partition networks for multi-person pose estimation. In:Proc. of the European Conf. on Computer Vision (ECCV). 2018. 684-699.
    [53] Varamesh A, Tuytelaars T. Mixture dense regression for object detection and human pose estimation. In:Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2020. 13086-13095.
    附中文参考文献:
    [2] 王磊,吴俊, 周志敏, 赵旭, 刘允才. 人体检测部分响应特征映射的人体动作识别. 软件学报, 2015, 26:128-136. http://www. jos.org.cn/1000-9825/15023.htm
    [27] 葛道辉, 李洪升, 张亮, 刘如意, 沈沛意, 苗启广. 轻量级神经网络架构综述. 软件学报, 2020, 31(9):2627-2653. http://www. jos.org.cn/1000-9825/5942.htm[doi:10.13328/j.cnki.jos.005942]
    [28] 张政馗, 庞为光, 谢文静, 吕鸣松, 王义. 面向实时应用的深度学习研究综述. 软件学报, 2020, 31(9):2654-2677. http://www. jos.org.cn/1000-9825/5946.htm[doi:10.13328/j.cnki.jos.005946]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张宇,温光照,米思娅,张敏灵,耿新.基于深度学习的二维人体姿态估计综述.软件学报,2022,33(11):4173-4191

复制
分享
文章指标
  • 点击次数:3438
  • 下载次数: 13236
  • HTML阅读次数: 6303
  • 引用次数: 0
历史
  • 收稿日期:2020-01-18
  • 最后修改日期:2021-01-06
  • 在线发布日期: 2021-08-02
  • 出版日期: 2022-11-06
文章二维码
您是第19831126位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号