Posture Prior Driven Double-branch Network Model for Accurate Human Parsing

doi:10.13328/j.cnki.jos.005933

微信服务号

微信订阅号

2025-6-1- 4

Home > Archive>Volume 31, Issue 7, 2020 >1959-1968. DOI:10.13328/j.cnki.jos.005933

PDF HTML XML Export Cite reminder

Posture Prior Driven Double-branch Network Model for Accurate Human Parsing
DOI:
                        10.13328/j.cnki.jos.005933
                    
Author:
                        GAO Ming-DaGAO Ming-Da
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SUN Yu-BaoSUN Yu-Bao
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU Qing-ShanLIU Qing-Shan
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHAO Xiao-WenSHAO Xiao-Wen
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61825601, 61532009, 61672292); Jiangsu Provincial Project (BRA2019077, DZXX-037)

Article

Figures

Metrics

Reference [24]

Related [20]

Cited by

Materials

Comments

Abstract:

Human parsing aims to segment a human image into multiple parts with fine-grained semantics and provides more detailed understanding of image contents. When the human body posture is complicated, the existing human parsing methods are easy to cause misjudgment to the human limb components, and the segmentation of the small target is not accurate enough. In order to solve the above problems, a double-branch networkjointingposture prior is proposed for accurate human parsing. The model first uses the backbone network to acquire the characteristics of the human body image, and then uses the pose prior information predicted by the human pose estimation model as the attention information to form a multi-scale feature expression driven by the human body structure prior. The multi-scale features are fed into the fully convolution network parsing branch and detection parsing branch separately. The fully convolutional network obtains global segmentation results, and the detection parsing branch pays more attention to the detection and segmentation of small-scale targets. The segmentation results of the two branches are fused to obtain the final parsing result, which can be more accurate. The experiment results verify the effectiveness of the proposed algorithm. Our Thisapproach has achieved 52.19% mIoU on LIP dataset, 68.29% mIoU on ATR dataset, which improves the human parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target componentsn parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target components.

Key words:human parsing;semantic segmentation;human pose estimation;object detection;convolution neural network

Reference

[1] Zhao R, Ouyang W, Wang X. Unsupervised salience learning for person re-identification. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, 2013. 3586-3593.

[2] Cai H, Wang Z, Cheng J. Multi-scale body-part mask guided attention for person re-identification. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops. 2019.

[3] Gan C, Lin M, Yang Y, et al. Concepts not alone:Exploring pairwise relationships for zero-shot video activity recognition. In:Proc. of the 30th AAAI Conf. on Artificial Intelligence. 2016.

[4] Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019,30(2):440-468(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm[doi:10.13328/j.cnki.jos. 005659]

[5] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Annals of the History of Computing, 2017,(4):640-651.

[6] Chen LC, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv Preprint arXiv:1412.7062, 2014.

[7] Chen LC, Papandreou G, Kokkinos I, et al. Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018,40(4):834-848.

[8] Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S. Human parsing with contextualized convolutional neural network. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1386-1394.

[9] LiangX, ShenX, Xiang D, Feng J, Lin L, Yan S. Semantic object parsing with local-global long short-term memory. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3185-3193.

[10] Chen LC, Yang Y, Wang J, et al. Attention to scale:Scale-aware semantic image segmentation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3640-3649.

[11] Gong K, Liang X, Zhang D, et al. Look into Person:Self-supervised structure-sensitive learning and a new benchmark for human parsing. 2017.[doi:10.1109/CVPR.2017.715]

[12] Liang X, Ke G, Shen X, et al. Look into Person:Joint body parsing & pose estimation network and a new benchmark. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2018,(99):1.

[13] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 770-778.

[14] Liang X, Yang J, Yang J, et al. Deep human parsing with active template regression. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2015,37(12):2402.

[15] Yang L, Song Q, Wang Z, et al. Parsing R-CNN for instance-level human analysis. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 364-373.

[16] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2117-2125.

[17] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. In:Proc. of the IEEE Conf. on Computer Vision & Pattern Recognition. 2017.

[18] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2881-2890.

[19] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In:Proc. of the 2017 IEEE Int'l Conf. on Computer Vision (ICCV). 2017.

[20] Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks. In:Advances in Neural Information Processing Systems. 2015. 91-99.

[21] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017,(99):2999-3007.

[22] Girshick R. Fast R-CNN. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1440-1448.

附中文参考文献:

[4] 田萱,王亮,丁琪.基于深度学习的图像语义分割方法综述.软件学报,2019,30(2):440-468. http://www.jos.org.cn/1000-9825/5659.htm

Get Citation

高明达,孙玉宝,刘青山,邵晓雯.联合姿态先验的人体精确解析双分支网络模型.软件学报,2020,31(7):1959-1968

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 30,2019
Revised:July 11,2019
Adopted:
Online: January 17,2020
Published: July 06,2020

You are the first2049436Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History