基于平行多尺度时空图卷积网络的三维人体姿态估计算法

doi:10.13328/j.cnki.jos.007200

微信服务号

微信订阅号

2025年5月1日 14:52 星期四

首页 > 过刊浏览>2025年第36卷第5期 >2151-2166. DOI:10.13328/j.cnki.jos.007200

PDF HTML阅读 XML下载导出引用引用提醒

基于平行多尺度时空图卷积网络的三维人体姿态估计算法
DOI:
                        10.13328/j.cnki.jos.007200
                    
CSTR:
                        32375.14.jos.007200
                    
作者:
                        杨红红杨红红
现代教学技术教育部重点实验室(陕西师范大学), 陕西 西安 710062;民歌智能计算与服务技术文化和旅游部重点实验室(陕西师范大学), 陕西 西安 710062
在期刊界中查找
在百度中查找
在本站中查找
刘泓希刘泓希
现代教学技术教育部重点实验室(陕西师范大学), 陕西 西安 710062
在期刊界中查找
在百度中查找
在本站中查找
张玉梅张玉梅
民歌智能计算与服务技术文化和旅游部重点实验室(陕西师范大学), 陕西 西安 710062;陕西师范大学 计算机科学学院, 陕西 西安 710062
在期刊界中查找
在百度中查找
在本站中查找
吴晓军吴晓军
现代教学技术教育部重点实验室(陕西师范大学), 陕西 西安 710062;民歌智能计算与服务技术文化和旅游部重点实验室(陕西师范大学), 陕西 西安 710062;陕西师范大学 计算机科学学院, 陕西 西安 710062
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391
基金项目:国家自然科学基金(61907028, 11872036); 陕西省青年科技新星项目(2021KJXX-91); 文化和旅游部重点实验室资助项目(2023-02, 2022-13); 陕西省自然科学基金面上项目(2024JC-YBMS-503)

Parallel Multi-scale Spatio-temporal Graph Convolutional Network for 3D Human Pose Estimation

Author:

YANG Hong-Hong
YANG Hong-Hong
Key Laboratory of Modern Teaching Technology (Shaanxi Normal University), Ministry of Education, Xi’an 710062, China;Key Laboratory of Intelligent Computing and Service Technology for Folk Song (Shaanxi Normal University), Ministry of Culture and Tourism, Xi’an 710062, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Hong-Xi
LIU Hong-Xi
Key Laboratory of Modern Teaching Technology (Shaanxi Normal University), Ministry of Education, Xi’an 710062, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yu-Mei
ZHANG Yu-Mei
Key Laboratory of Intelligent Computing and Service Technology for Folk Song (Shaanxi Normal University), Ministry of Culture and Tourism, Xi’an 710062, China;School of Computer Science, Shaanxi Normal University, Xi’an 710062, China
在期刊界中查找
在百度中查找
在本站中查找
WU Xiao-Jun
WU Xiao-Jun
Key Laboratory of Modern Teaching Technology (Shaanxi Normal University), Ministry of Education, Xi’an 710062, China;Key Laboratory of Intelligent Computing and Service Technology for Folk Song (Shaanxi Normal University), Ministry of Culture and Tourism, Xi’an 710062, China;School of Computer Science, Shaanxi Normal University, Xi’an 710062, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [37]

相似文献

引证文献

资源附件

文章评论

摘要:

针对基于图卷积神经网络(GCN)的人体姿态估计方法不能充分聚合关节点时空特征、限制判别性特征提取的问题, 构造基于平行多尺度时空图卷积的网络模型(PMST-GNet), 提高三维人体姿态估计的性能. 该模型首先设计对角占优的时空注意力图卷积(DDA-STGConv), 构建跨域时空邻接矩阵, 对骨架关节点信息进行基于自约束和注意力机制约束的建模, 增强节点间的信息交互; 然后, 通过设计图拓扑聚合函数构造不同的图拓扑结构, 以DDA-STGConv为基本单元构建平行多尺度子网络模块(PM-SubGNet); 最后, 为了更好地提取骨架关节的上下文信息, 设计多尺度特征交叉融合模块(MFEB), 实现平行子图网络之间多尺度信息的交互, 提高GCN的特征表示能力. 在主流3D姿态估计数据集Human3.6M和MPI-INF-3DHP数据集上的对比实验结果表明, 所提PMST-GNet模型在三维人体姿态估计中具有较好的效果, 优于Sem-GCN、GraphSH、UGCN等当前基于GCN网络的主流算法.

关键词:三维人体姿态估计;对角占优的时空注意力图卷积;平行多尺度子网络;多尺度特征交叉融合

Abstract:

As the human pose estimation (HPE) method based on graph convolutional network (GCN) cannot sufficiently aggregate spatiotemporal features of skeleton joints and restrict discriminative features extraction, in this paper, a parallel multi-scale spatio-temporal graph convolutional network (PMST-GNet) model is built to improve the performance of 3D HPE. Firstly, a diagonally dominant spatiotemporal attention graph convolutional layer (DDA-STGConv) is designed to construct a cross-domain spatiotemporal adjacency matrix and model the joint features based on self-constraint and attention mechanism constrain, therefore enhancing information interaction among nodes. Then, a graph topology aggregation function is devised to construct different graph topologies, and a parallel multi-scale sub-graph network module (PM-SubGNet) is constructed with DDA-STGConv as the basic unit. Finally, a multi-scale feature cross fusion block (MFEB) is designed, by which multi-scale information among PM-SubGNets can interact to improve the feature representation of GCN, therefore better extracting the context information of skeleton joints. The experimental results on the mainstream 3D HPE datasets Human3.6M and MPI-INF-3DHP show that the proposed PMST-GNet model has a good effect in 3D HPE and is superior to the current mainstream GCN-based algorithms such as Sem-GCN, GraphSH, and UGCN.

Key words:3D human pose estimation (3D HPE);diagonally dominant spatio-temporal attention graph convolution;parallel multi-scale sub-graph network;multi-scale feature cross fusion

参考文献

[1] 张宇, 温光照, 米思娅, 张敏灵, 耿新. 基于深度学习的二维人体姿态估计综述. 软件学报, 2022, 33(11): 4173–4191. http://www.jos.org.cn/1000-9825/6390.htm

Zhang Y, Wen GZ, Mi SY, Zhang ML, Geng X. Overview on 2D human pose estimation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2022, 33(11): 4173–4191 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6390.htm

[2] 丁静, 舒祥波, 黄捧, 姚亚洲, 宋砚. 基于多模态多粒度图卷积网络的老年人日常行为识别. 软件学报, 2023, 34(5): 2350–2364. http://www.jos.org.cn/1000-9825/6439.htm

Ding J, Shu XB, Huang P, Yao YZ, Song Y. Multimodal and multi-granularity graph convolutional networks for elderly daily activity recognition. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2350–2364 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6439.htm

[3] Yang HH, Liu HX, Zhang YM, Wu XJ. FMR-GNet: Forward mix-hop spatial-temporal residual graph network for 3D pose estimation. Chinese Journal of Electronics, 2024, 33(6): 1–14.

[4] Moon G, Lee KM. I2L-MeshNet: Image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 752–768. [doi: 10.1007/978-3-030-58571-6_44]

[5] Zhao L, Peng X, Tian Y, Kapadia M, Metaxas DN. Semantic graph convolutional networks for 3D human pose regression. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 3420–3430.

[6] Liu JF, Rojas J, Li YH, Liang ZJ, Guan YS, Xi N, Zhu HF. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video. In: Proc. of the 2021 IEEE Int’l Conf. on Robotics and Automation. Xi’an: IEEE, 2021. 3374–3380.

[7] Cai YJ, Ge LH, Liu J, Cai JF, Cham TJ, Yuan JS, Thalmann NM. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 2272–2281. [doi: 10.1109/ICCV.2019.00236]

[8] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.

[9] Wu YP, Kong DH, Wang SF, Li JH, Yin BC. HPGCN: Hierarchical poselet-guided graph convolutional network for 3D pose estimation. Neurocomputing, 2022, 487: 243–256.

[10] 王文冠, 沈建冰, 贾云得. 视觉注意力检测综述. 软件学报, 2019, 30(2): 416–439. http://www.jos.org.cn/1000-9825/5636.htm

Wang WG, Shen JB, Jia YD. Review of visual attention detection. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 416–439 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5636.htm

[11] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.

[12] Pavllo D, Feichtenhofer C, Grangier D, Auli M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 7745–7754. [doi: 10.1109/CVPR.2019.00794]

[13] Sun K, Xiao B, Liu D, Wang JD. Deep high-resolution representation learning for human pose estimation. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5686–5696. [doi: 10.1109/CVPR.2019.00584]

[14] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325–1339.

[15] Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu WP, Theobalt C. Monocular 3D human pose estimation in the wild using improved CNN supervision. In: Proc. of the 2017 Int’l Conf. on 3D Vision. Qingdao: IEEE, 2017. 506–516.

[16] Zou ZM, Liu KK, Wang L, Tang W. High-order graph convolutional networks for 3D human pose estimation. In: Proc. of the 31st British Machine Vision Conf. BMVA Press, 2020.

[17] Li H, Shi BW, Dai WR, Chen YB, Wang BT, Sun Y, Guo M, Li CL, Zou JN, Xiong HK. Hierarchical graph networks for 3D human pose estimation. In: Proc. of the 32nd British Machine Vision Conf. (BMVC). BMVA Press, 2021. 387.

[18] Chen XP, Lin KY, Liu WT, Qian C, Lin L. Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 10887–10896. [doi: 10.1109/CVPR.2019.01115]

[19] Chen YL, Wang ZC, Peng YX, Zhang ZQ, Yu G, Sun J. Cascaded pyramid network for multi-person pose estimation. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7103–7112.

[20] Chen TL, Fang C, Shen XH, Zhu YH, Chen ZL, Luo JB. Anatomy-aware 3D human pose estimation with bone-based pose decomposition. IEEE Trans. on Circuits and Systems for Video Technology, 2022, 32(1): 198–209.

[21] Lin JH, Lee GH. Trajectory space factorization for deep video-based 3D human pose estimation. In: Proc. of the 30th British Machine Vision Conf. Cardiff: BMVA Press, 2019. 101.

[22] Li SC, Ke L, Pratama K, Tai YW, Tang CK, Cheng KT. Cascaded deep monocular 3D human pose estimation with evolutionary training data. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 6172–6182.

[23] Zheng C, Zhu SJ, Mendieta M, Yang TJN, Chen C, Ding ZM. 3D Human pose estimation with spatial and temporal Transformers. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 11636–11645.

[24] Xu TH, Takano W. Graph stacked hourglass networks for 3D human pose estimation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 16100–16109. [doi: 10.1109/CVPR46437.2021.01584]

[25] Martinez J, Hossain R, Romero J, Little JJ. A simple yet effective baseline for 3D human pose estimation. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision (ICCV). Venice: IEEE, 2017. 2659–2668. [doi: 10.1109/ICCV.2017.288]

[26] Liu KK, Ding RQ, Zou ZM, Wang L, Tang W. A comprehensive study of weight sharing in graph networks for 3D human pose estimation. In: Proc. of the 16th European Conf. on Computer Vision 2020 (ECCV). Glasgow: Springer, 2020. 318–334. [doi: 10.1007/978-3-030-58607-2_19]

[27] Zeng AL, Sun X, Yang L, Zhao NX, Liu MH, Xu Q. Learning skeletal graph neural networks for hard 3D pose estimation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 11416–11425. [doi: 10.1109/ICCV48922.2021.01124]

[28] Li WH, Liu H, Tang H, Wang PC, van Gool L. MHFormer: Multi-hypothesis transformer for 3D human pose estimation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022. 13137–13146.

[29] Shan WK, Liu ZH, Zhang XF, Wang SS, Ma SW, Gao W. P-STMO: Pre-trained spatial temporal many-to-one model for 3D human pose estimation. In: Proc. of the 17th European Conf. on Computer Vision (ECCV). Tel Aviv: Springer, 2022. 461–478. [doi: 10.1007/978-3-031-20065-6_27]

[30] Bai GH, Luo YM, Pan XL, Wang J, Guo JM. Real-time 3D human pose estimation without skeletal a priori structures. Image and Vision Computing, 2023, 132: 104649.

[31] Li H, Shi BW, Dai WR, Zheng HW, Wang BT, Sun Y, Guo M, Li CL, Zou JN, Xiong HK. Pose-oriented Transformer with uncertainty-guided refinement for 2D-to-3D human pose estimation. In: Proc. of the 37th AAAI Conf. on Artificial Intelligence. Washington: AAAI, 2023. 1296–1304. [doi: 10.1609/aaai.v37i1.25213]

[32] Han CC, Yu X, Gao CX, Sang N, Yang Y. Single image based 3D human pose estimation via uncertainty learning. Pattern Recognition, 2022, 132: 108934.

[33] Wang JB, Yan SJ, Xiong YJ, Lin DH. Motion guided 3D pose estimation from videos. In: Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 764–780. [doi: 10.1007/978-3-030-58601-0_45]

[34] Yang HH, Liu HX, Zhang YM, Wu XJ. Hierarchical parallel multi-scale graph network for 3D human pose estimation. Applied Soft Computing, 2023, 140: 110267

引用本文

杨红红,刘泓希,张玉梅,吴晓军.基于平行多尺度时空图卷积网络的三维人体姿态估计算法.软件学报,2025,36(5):2151-2166

复制

文章指标

点击次数:282
下载次数: 1740
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2022-11-14
最后修改日期:2023-07-20
录用日期:
在线发布日期: 2024-06-20
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码