Abstract:As the human pose estimation (HPE) method based on graph convolutional network (GCN) cannot sufficiently aggregate spatiotemporal features of skeleton joints and restrict discriminative features extraction, in this paper, a parallel multi-scale spatio-temporal graph convolutional network (PMST-GNet) model is built to improve the performance of 3D HPE. Firstly, a diagonally dominant spatiotemporal attention graph convolutional layer (DDA-STGConv) is designed to construct a cross-domain spatiotemporal adjacency matrix and model the joint features based on self-constraint and attention mechanism constrain, therefore enhancing information interaction among nodes. Then, a graph topology aggregation function is devised to construct different graph topologies, and a parallel multi-scale sub-graph network module (PM-SubGNet) is constructed with DDA-STGConv as the basic unit. Finally, a multi-scale feature cross fusion block (MFEB) is designed, by which multi-scale information among PM-SubGNets can interact to improve the feature representation of GCN, therefore better extracting the context information of skeleton joints. The experimental results on the mainstream 3D HPE datasets Human3.6M and MPI-INF-3DHP show that the proposed PMST-GNet model has a good effect in 3D HPE and is superior to the current mainstream GCN-based algorithms such as Sem-GCN, GraphSH, and UGCN.