Posture Prior Driven Double-branch Network Model for Accurate Human Parsing
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61825601, 61532009, 61672292); Jiangsu Provincial Project (BRA2019077, DZXX-037)

  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Human parsing aims to segment a human image into multiple parts with fine-grained semantics and provides more detailed understanding of image contents. When the human body posture is complicated, the existing human parsing methods are easy to cause misjudgment to the human limb components, and the segmentation of the small target is not accurate enough. In order to solve the above problems, a double-branch networkjointingposture prior is proposed for accurate human parsing. The model first uses the backbone network to acquire the characteristics of the human body image, and then uses the pose prior information predicted by the human pose estimation model as the attention information to form a multi-scale feature expression driven by the human body structure prior. The multi-scale features are fed into the fully convolution network parsing branch and detection parsing branch separately. The fully convolutional network obtains global segmentation results, and the detection parsing branch pays more attention to the detection and segmentation of small-scale targets. The segmentation results of the two branches are fused to obtain the final parsing result, which can be more accurate. The experiment results verify the effectiveness of the proposed algorithm. Our Thisapproach has achieved 52.19% mIoU on LIP dataset, 68.29% mIoU on ATR dataset, which improves the human parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target componentsn parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target components.

    Reference
    [1] Zhao R, Ouyang W, Wang X. Unsupervised salience learning for person re-identification. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, 2013. 3586-3593.
    [2] Cai H, Wang Z, Cheng J. Multi-scale body-part mask guided attention for person re-identification. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops. 2019.
    [3] Gan C, Lin M, Yang Y, et al. Concepts not alone:Exploring pairwise relationships for zero-shot video activity recognition. In:Proc. of the 30th AAAI Conf. on Artificial Intelligence. 2016.
    [4] Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019,30(2):440-468(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm[doi:10.13328/j.cnki.jos. 005659]
    [5] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Annals of the History of Computing, 2017,(4):640-651.
    [6] Chen LC, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv Preprint arXiv:1412.7062, 2014.
    [7] Chen LC, Papandreou G, Kokkinos I, et al. Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018,40(4):834-848.
    [8] Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S. Human parsing with contextualized convolutional neural network. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1386-1394.
    [9] LiangX, ShenX, Xiang D, Feng J, Lin L, Yan S. Semantic object parsing with local-global long short-term memory. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3185-3193.
    [10] Chen LC, Yang Y, Wang J, et al. Attention to scale:Scale-aware semantic image segmentation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3640-3649.
    [11] Gong K, Liang X, Zhang D, et al. Look into Person:Self-supervised structure-sensitive learning and a new benchmark for human parsing. 2017.[doi:10.1109/CVPR.2017.715]
    [12] Liang X, Ke G, Shen X, et al. Look into Person:Joint body parsing & pose estimation network and a new benchmark. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2018,(99):1.
    [13] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 770-778.
    [14] Liang X, Yang J, Yang J, et al. Deep human parsing with active template regression. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2015,37(12):2402.
    [15] Yang L, Song Q, Wang Z, et al. Parsing R-CNN for instance-level human analysis. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 364-373.
    [16] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2117-2125.
    [17] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. In:Proc. of the IEEE Conf. on Computer Vision & Pattern Recognition. 2017.
    [18] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2881-2890.
    [19] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In:Proc. of the 2017 IEEE Int'l Conf. on Computer Vision (ICCV). 2017.
    [20] Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks. In:Advances in Neural Information Processing Systems. 2015. 91-99.
    [21] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017,(99):2999-3007.
    [22] Girshick R. Fast R-CNN. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1440-1448.
    附中文参考文献:
    [4] 田萱,王亮,丁琪.基于深度学习的图像语义分割方法综述.软件学报,2019,30(2):440-468. http://www.jos.org.cn/1000-9825/5659.htm
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高明达,孙玉宝,刘青山,邵晓雯.联合姿态先验的人体精确解析双分支网络模型.软件学报,2020,31(7):1959-1968

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 30,2019
  • Revised:July 11,2019
  • Online: January 17,2020
  • Published: July 06,2020
You are the first2049436Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063