联合姿态先验的人体精确解析双分支网络模型

doi:10.13328/j.cnki.jos.005933

微信服务号

微信订阅号

2025年4月5日 20:26 星期六

首页 > 过刊浏览>2020年第31卷第7期 >1959-1968. DOI:10.13328/j.cnki.jos.005933

PDF HTML阅读 XML下载导出引用引用提醒

联合姿态先验的人体精确解析双分支网络模型
DOI:
                        10.13328/j.cnki.jos.005933
                    
CSTR:
                        
                    
作者:
                        高明达高明达
江苏省大数据分析技术重点实验室(南京信息工程大学 自动化学院), 江苏 南京 210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学 自动化学院), 江苏 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
孙玉宝孙玉宝
江苏省大数据分析技术重点实验室(南京信息工程大学 自动化学院), 江苏 南京 210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学 自动化学院), 江苏 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
刘青山刘青山
江苏省大数据分析技术重点实验室(南京信息工程大学 自动化学院), 江苏 南京 210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学 自动化学院), 江苏 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
邵晓雯邵晓雯
江苏省大数据分析技术重点实验室(南京信息工程大学 自动化学院), 江苏 南京 210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学 自动化学院), 江苏 南京 210044
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:高明达(1994-),女,硕士,主要研究领域为图像分割;刘青山(1975-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为图像与视频理解,模式识别;孙玉宝(1983-),男,博士,副教授,CCF专业会员,主要研究领域为主要从事深度学习理论,压缩感知重建,人体解析;邵晓雯(1996-),女,硕士,主要研究领域为行人重识别.
通讯作者:孙玉宝,E-mail:sunyb@nuist.edu.cn
中图分类号:
基金项目:国家自然科学基金（61825601，61532009，61672292）；江苏省级项目（BRA2019077，DZXX-037）

Posture Prior Driven Double-branch Network Model for Accurate Human Parsing

Author:

GAO Ming-Da
GAO Ming-Da
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Yu-Bao
SUN Yu-Bao
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Qing-Shan
LIU Qing-Shan
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
SHAO Xiao-Wen
SHAO Xiao-Wen
Jiangsu Key Laboratory of Big Data Analysis Technology(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China;Jiangsu Province Atmospheric Environment and Equipment Technology Collaborative Innovation Center(School of Automation, Nanjing University of Information Science and Technology), Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61825601, 61532009, 61672292); Jiangsu Provincial Project (BRA2019077, DZXX-037)

摘要

图/表

访问统计

参考文献 [24]

相似文献

引证文献

资源附件

文章评论

摘要:

人体解析旨在将人体图像分割成多个具有细粒度语义的部件区域，进行形成对人体图像的语义理解.然而，由于人体姿态的复杂性，现有的人体解析算法容易对人体四肢部件形成误判，且对于小目标区域的分割不够精确.针对上述问题，联合人体姿态估计信息，提出了一种人体精确解析的双分支网络模型.该模型首先使用基干网络表征人体图像，将人体姿态估计模型预测到的姿态先验作为基干网络的注意力信息，进而形成人体结构先验驱动的多尺度特征表达，并将提取的特征分别输入至全卷积网络解析分支与检测解析分支.全卷积网络解析分支获得全局分割结果，检测解析分支更关注小尺度目标的检测与分割，融合两个分支的预测信息可以获得更为精确的分割结果.实验结果验证了该算法的有效性，在当前主流的人体解析数据集LIP和ATR上，所提方法的mIoU评测指标分别为52.19%和68.29%，有效提升了解析精度，在人体四肢部件以及小目标部件区域获得了更为准确的分割结果.

关键词:人体解析;语义分割;人体姿态估计;部件检测;卷积神经网络

Abstract:

Human parsing aims to segment a human image into multiple parts with fine-grained semantics and provides more detailed understanding of image contents. When the human body posture is complicated, the existing human parsing methods are easy to cause misjudgment to the human limb components, and the segmentation of the small target is not accurate enough. In order to solve the above problems, a double-branch networkjointingposture prior is proposed for accurate human parsing. The model first uses the backbone network to acquire the characteristics of the human body image, and then uses the pose prior information predicted by the human pose estimation model as the attention information to form a multi-scale feature expression driven by the human body structure prior. The multi-scale features are fed into the fully convolution network parsing branch and detection parsing branch separately. The fully convolutional network obtains global segmentation results, and the detection parsing branch pays more attention to the detection and segmentation of small-scale targets. The segmentation results of the two branches are fused to obtain the final parsing result, which can be more accurate. The experiment results verify the effectiveness of the proposed algorithm. Our Thisapproach has achieved 52.19% mIoU on LIP dataset, 68.29% mIoU on ATR dataset, which improves the human parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target componentsn parsing accuracy effectively and achieves more accurate segmentation results in the human limb components and small target components.

Key words:human parsing;semantic segmentation;human pose estimation;object detection;convolution neural network

参考文献

[1] Zhao R, Ouyang W, Wang X. Unsupervised salience learning for person re-identification. In:Proc. of the 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, 2013. 3586-3593.

[2] Cai H, Wang Z, Cheng J. Multi-scale body-part mask guided attention for person re-identification. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops. 2019.

[3] Gan C, Lin M, Yang Y, et al. Concepts not alone:Exploring pairwise relationships for zero-shot video activity recognition. In:Proc. of the 30th AAAI Conf. on Artificial Intelligence. 2016.

[4] Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019,30(2):440-468(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm[doi:10.13328/j.cnki.jos. 005659]

[5] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Annals of the History of Computing, 2017,(4):640-651.

[6] Chen LC, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv Preprint arXiv:1412.7062, 2014.

[7] Chen LC, Papandreou G, Kokkinos I, et al. Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018,40(4):834-848.

[8] Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S. Human parsing with contextualized convolutional neural network. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1386-1394.

[9] LiangX, ShenX, Xiang D, Feng J, Lin L, Yan S. Semantic object parsing with local-global long short-term memory. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3185-3193.

[10] Chen LC, Yang Y, Wang J, et al. Attention to scale:Scale-aware semantic image segmentation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 3640-3649.

[11] Gong K, Liang X, Zhang D, et al. Look into Person:Self-supervised structure-sensitive learning and a new benchmark for human parsing. 2017.[doi:10.1109/CVPR.2017.715]

[12] Liang X, Ke G, Shen X, et al. Look into Person:Joint body parsing & pose estimation network and a new benchmark. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2018,(99):1.

[13] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 770-778.

[14] Liang X, Yang J, Yang J, et al. Deep human parsing with active template regression. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2015,37(12):2402.

[15] Yang L, Song Q, Wang Z, et al. Parsing R-CNN for instance-level human analysis. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 364-373.

[16] Lin TY, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2117-2125.

[17] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. In:Proc. of the IEEE Conf. on Computer Vision & Pattern Recognition. 2017.

[18] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2881-2890.

[19] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In:Proc. of the 2017 IEEE Int'l Conf. on Computer Vision (ICCV). 2017.

[20] Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks. In:Advances in Neural Information Processing Systems. 2015. 91-99.

[21] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017,(99):2999-3007.

[22] Girshick R. Fast R-CNN. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 1440-1448.

附中文参考文献:

[4] 田萱,王亮,丁琪.基于深度学习的图像语义分割方法综述.软件学报,2019,30(2):440-468. http://www.jos.org.cn/1000-9825/5659.htm

引用本文

高明达,孙玉宝,刘青山,邵晓雯.联合姿态先验的人体精确解析双分支网络模型.软件学报,2020,31(7):1959-1968

复制

文章指标

点击次数:2882
下载次数: 6309
HTML阅读次数: 3726
引用次数: 0

历史

收稿日期:2019-04-30
最后修改日期:2019-07-11
录用日期:
在线发布日期: 2020-01-17
出版日期: 2020-07-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码