Multi-person 3D Pose Estimation Using Human-and-scene Contexts

doi:10.13328/j.cnki.jos.006837

微信服务号

微信订阅号

2025-4-16- 7

Home > Archive>Volume 35, Issue 4, 2024 >2039-2054. DOI:10.13328/j.cnki.jos.006837

PDF HTML XML Export Cite reminder

Multi-person 3D Pose Estimation Using Human-and-scene Contexts
DOI:
                        10.13328/j.cnki.jos.006837
                    
Author:
                        HE Jian-HangHE Jian-Hang
School of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SUN Jun-YaoSUN Jun-Yao
School of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU QiongLIU Qiong
School of Software Engineering, South China University of Technology, Guangzhou 510006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Depth ambiguity is an important challenge for multi-person three-dimensional (3D) pose estimation of single-frame images, and extracting contexts from an image has great potential for alleviating depth ambiguity. Current top-down approaches usually model key point relationships based on human detection, which not only easily results in key point shifting or mismatching but also affects the reliability of absolute depth estimation using human scale factor because of a coarse-grained human bounding box with large background noise. Bottom-up approaches directly detect human key points from an image and then restore the 3D human pose one by one. However, the approaches are at a disadvantage in relative depth estimation although the scene context can be obtained explicitly. This study proposes a new two-branch network, in which human context based on key point region proposal and scene context based on 3D space are extracted by top-down and bottom-up branches, respectively. The human context extraction method with noise resistance is proposed to describe the human by modeling key point region proposal. The dynamic sparse key point relationship for pose association is modeled to eliminate weak connections and reduce noise propagation. A scene context extraction method from a bird’s-eye-view is proposed. The human position layout in 3D space is obtained by modeling the image’s depth features and mapping them to a bird’s-eye-view plane. A network fusing human and scene contexts is designed to predict absolute human depth. The experiments are carried out on public datasets, namely MuPoTS-3D and Human3.6M, and results show that compared with those by the state-of-the-art models, the relative and absolute position accuracies of 3D key points by the proposed HSC-Pose are improved by at least 2.2% and 0.5%, respectively, and the position error of mean roots of the key points is reduced by at least 4.2 mm.

Key words:multi-person 3D pose estimation;keypoint region proposal;human context;scene context;absolute human depth

Get Citation

何建航,孙郡瑤,刘琼.基于人体和场景上下文的多人3D姿态估计.软件学报,2024,35(4):2039-2054

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 31,2022
Revised:August 16,2022
Adopted:
Online: July 28,2023
Published: April 06,2024

You are the first2035257Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History