基于语义调制的弱监督语义分割

doi:10.13328/j.cnki.jos.007265

微信服务号

微信订阅号

2025年5月1日 7:55 星期四

首页 > 过刊浏览>年第卷第期 >1-15. DOI:10.13328/j.cnki.jos.007265

PDF HTML阅读 XML下载导出引用引用提醒

基于语义调制的弱监督语义分割
DOI:
                        10.13328/j.cnki.jos.007265
                    
CSTR:
                        
                    
作者:
                        李军侠李军侠
南京信息工程大学 计算机学院, 江苏 南京 210094;江苏省大气环境与装备技术协同创新中心, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找
苏京峰苏京峰
南京信息工程大学 计算机学院, 江苏 南京 210094;江苏省大气环境与装备技术协同创新中心, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找
崔滢崔滢
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
在期刊界中查找
在百度中查找
在本站中查找
刘青山刘青山
南京信息工程大学 计算机学院, 江苏 南京 210094;江苏省大气环境与装备技术协同创新中心, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391
基金项目:国家重点研发计划(2022YFC2405600); 国家自然科学基金(62272235, 62102364, U21B2044); 浙江省自然科学基金(LY22F020016)

Semantic-modulation-based Weakly Supervised Semantic Segmentation

Author:

LI Jun-Xia
LI Jun-Xia
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210094, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找
SU Jing-Feng
SU Jing-Feng
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210094, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找
CUI Ying
CUI Ying
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Qing-Shan
LIU Qing-Shan
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210094, China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [52]

相似文献

引证文献

资源附件

文章评论

摘要:

图像级标注下的弱监督语义分割方法通常采用卷积神经网络(CNN)生成类激活图以精确定位目标位置, 其面临的主要挑战在于CNN对全局信息感知能力的不足导致前景区域过小的问题. 近年来, 基于Transformer的弱监督语义分割方法利用自注意力机制捕捉全局依赖关系, 解决了CNN的固有缺陷. 然而, Transformer生成的初始类激活图会在目标区域周围引入大量背景噪声, 此时直接对初始类激活图进行使用并不能取得令人满意的效果. 通过综合利用Transformer生成的类与块间注意力(class-to-patch attention)以及区域块间注意力(patch-to-patch attention)对初始类激活图进行联合优化, 同时, 由于原始的类与块间注意力存在误差, 对此设计一种语义调制策略, 利用区域块间注意力的语义上下文信息对类与块间注意力进行调制, 修正其误差, 最终得到能够准确覆盖较多目标区域的类激活图. 在此基础上, 构建一种新颖的基于Transformer的弱监督语义分割模型. 所提方法在PASCAL VOC 2012验证集和测试集上mIoU值分别达到72.7%和71.9%, MS COCO 2014验证集上mIoU为42.3%, 取得了目前较为先进的弱监督语义分割结果.

关键词:语义分割;弱监督学习;语义上下文;Transformer;类激活图

Abstract:

Image-level weakly supervised semantic segmentation usually uses convolutional neural networks (CNNs) to generate class activation maps to accurately locate targets. However, CNNs have a limited capacity to perceive global information, which results in excessively narrow foregrounds. Recently, Transformer-based weakly supervised semantic segmentation has utilized self-attention mechanisms to capture global dependencies, addressing the inherent defects of CNNs. Nevertheless, the initial class activation map generated by a Transformer often introduces a lot of background noise around the target area, resulting in unsatisfactory performance if used directly. This study comprehensively utilizes both class-to-patch and patch-to-patch attention generated by a Transformer to optimize the initial class activation map. At the same time, a semantic modulation strategy is designed to correct errors in the class-to-patch attention, using the semantic context information of the patch-to-patch attention. Finally, a class activation map that accurately covers more target areas is obtained. On this basis, a novel model for weakly supervised semantic segmentation based on a Transformer is constructed. The mIoU of the proposed method reaches 72.7% and 71.9% on the PASCAL VOC 2012 validation and test sets, respectively, and 42.3% on the MS COCO 2014 validation set, demonstrating that the proposed method achieves improved performance in weakly supervised semantic segmentation.

Key words:semantic segmentation;weakly supervised learning;semantic context;Transformer;class activation map

参考文献

[1] 白琮, 黄玲, 陈佳楠, 潘翔, 陈胜勇. 面向大规模图像分类的深度卷积神经网络优化. 软件学报, 2018, 29(4): 1029–1038. http://www.jos.org.cn/1000-9825/5404.htm

Bai C, Huang L, Chen JN, Pan X, Chen SY. Optimization of deep convolutional neural network for large scale image classification. Ruan Jian Xue Bao/Journal of Software, 2018, 29(4): 1029–1038 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5404.htm

[2] 田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述. 软件学报, 2019, 30(2): 440–468. http://www.jos.org.cn/1000-9825/5659.htm

Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 440–468 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm

[3] Khoreva A, Benenson R, Hosang J, Hein M, Schiele B. Simple does it: Weakly supervised instance and semantic segmentation. In: Proc. of the 2017 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 876–885.

[4] Lin D, Dai JF, Jia JY, He KM, Sun J. ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In: Proc. of the 2016 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 3159–3167.

[5] Bearman A, Russakovsky O, Ferrari V, Fei-Fei L. What’s the point: Semantic segmentation with point supervision. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 549–565. [doi: 10.1007/978-3-319-46478-7_34]

[6] Wang YD, Zhang J, Kan MN, Shan SG, Chen XL. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12275–12284. [doi: 10.1109/CVPR42600.2020.01229]

[7] Ahn J, Kwak S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4981–4990.

[8] Chang YT, Wang QS, Hung WC, Piramuthu R, Tsai YH, Yang MH. Weakly-supervised semantic segmentation via sub-category exploration. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 8988–8997. [doi: 10.1109/CVPR42600.2020.00901]

[9] Zhou BL, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2921–2929. [doi: 10.1109/CVPR.2016.319]

[10] Chen ZW, Wang CA, Wang YB, Jiang GN, Shen YH, Tai Y, Wang CJ, Zhang W, Cao LJ. LCTR: On awakening the local continuity of Transformer for weakly supervised object localization. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2020. 710–718.

[11] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on Learning Representations. OpenReview.net, 2021.

[12] 石泽男, 陈海鹏, 张冬, 申铉京. 预训练驱动的多模态边界感知视觉Transformer. 软件学报, 2023, 34(5): 2051–2067. http://www.jos.org.cn/1000-9825/6768.htm

Shi ZN, Chen HP, Zhang D, Shen XJ. Pre-training-driven multimodal boundary-aware vision Transformer. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2051–2067 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6768.htm

[13] Gao W, Wan F, Pan XJ, Peng ZL, Tian Q, Han ZJ, Zhou BL, Ye QX. TS-CAM: Token semantic coupled attention map for weakly supervised object localization. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 2866–2875. [doi: 10.1109/ICCV48922.2021.00288]

[14] Xu L, Ouyang WL, Bennamoun M, Boussaid F, Xu D. Multi-class token Transformer for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4300–4309.

[15] Wei YC, Feng JS, Liang XD, Cheng MM, Zhao Y, Yan SC. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6488–6496. [doi: 10.1109/CVPR.2017.687]

[16] Jiang PT, Hou QB, Cao Y, Cheng MM, Wei YC, Xiong HK. Integral object mining via online attention accumulation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 2070–2079. [doi: 10.1109/ICCV.2019.00216]

[17] Sun GL, Wang WG, Dai JF, van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 347–365. [doi: 10.1007/978-3-030-58536-5_21]

[18] Zhang F, Gu CC, Zhang CY, Dai YC. Complementary patch for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7222–7231. [doi: 10.1109/ICCV48922.2021.00715]

[19] Jiang PT, Yang YQ, Hou QB, Wei YC. L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16865–16875. [doi: 10.1109/CVPR52688.2022.01638]

[20] Qin J, Wu J, Xiao XF, Li LJ, Wang XG. Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 2117–2125.

[21] Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2209–2218. [doi: 10.1109/CVPR.2019.00231]

[22] Ru LX, Zhan YB, Yu BS, Du B. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with Transformers. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16825–16834. [doi: 10.1109/CVPR52688.2022.01634]

[23] Ru LX, Zheng HL, Zhan YB, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3093–3102. [doi: 10.1109/CVPR52729.2023.00302]

[24] Li RW, Mai ZD, Zhang ZB, Jang J, Sanner S. TransCAM: Transformer attention-based CAM refinement for weakly supervised semantic segmentation. Journal of Visual Communication and Image Representation, 2023, 92: 103800.

[25] Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image Transformers & distillation through attention. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 10347–10357.

[26] Wu ZF, Shen CH, van den Hengel A. Wider or deeper: Revisiting the ResNet model for visual recognition. Pattern Recognition, 2019, 90: 119–133.

[27] Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. Int’l Journal of Computer Vision, 2010, 88(2): 303–338.

[28] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In: Proc. of the 13th European Conf. on Computer Vision. Zurich: Springer, 2014. 740–755. [doi: 10.1007/978-3-319-10602-1_48]

[29] Lin YQ, Chen MH, Wang WX, Wu BX, Li K, Lin BB, Liu HF, He XF. CLIP is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 15305–15314. [doi: 10.1109/CVPR52729.2023.01469]

[30] Chen ZZ, Sun QR. Extracting class activation maps from non-discriminative features as well. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3135–3144. [doi: 10.1109/CVPR52729.2023.00306]

[31] Xie JH, Hou XX, Ye K, Shen LL. CLIMS: Cross language image matching for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4473–4482.

[32] Wu T, Huang JS, Gao GY, Wei XM, Wei XL, Luo X, Liu CH. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 16765–16774. [doi: 10.1109/CVPR46437.2021.01649]

[33] Lee J, Kim E, Yoon S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 4071–4080.

[34] Li Y, Kuang ZH, Liu LY, Chen YM, Zhang W. Pseudo-mask matters in weakly-supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 6964–6973. [doi: 10.1109/ICCV48922.2021.00688]

[35] Jo S, Yu IJ, Kim K. MARS: Model-agnostic biased object removal without additional supervision for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 614–623.

[36] Lee S, Lee M, Lee J, Shim H. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5495–5505.

[37] Sun WX, Zhang J, Barnes N. Inferring the class conditional response map for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2022. 2653–2662.

[38] Zhou TF, Zhang MJ, Zhao F, Li JW. Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4299–4309.

[39] Su YK, Sun RZ, Lin GS, Wu QY. Context decoupling augmentation for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7004–7014. [doi: 10.1109/ICCV48922.2021.00692]

[40] Li Y, Duan YQ, Kuang ZH, Chen YM, Zhang W, Li XM. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 1447–1455.

[41] Chen ZZ, Wang T, Wu XW, Hua XS, Zhang HW, Sun QR. Class re-activation maps for weakly-supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 959–968.

[42] Lee M, Kim D, Shim H. Threshold matters in WSSS: Manipulating the activation for the robust and accurate segmentation model against thresholds. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4330–4339. [doi: 10.1109/CVPR52688.2022.00429]

[43] Lee J, Oh SJ, Yun S, Choe J, Kim E, Yoon S. Weakly supervised semantic segmentation using out-of-distribution data. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16897–16906.

[44] Chen Q, Yang LX, Lai JH, Xie XH. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4288–4298. [doi: 10.1109/CVPR52688.2022.00425]

[45] Rossetti S, Zappia D, Sanzari M, Schaerf M, Pirri F. Max pooling with vision Transformers reconciles class and shape in weakly supervised semantic segmentation. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 446–463. [doi: 10.1007/978-3-031-20056-4_26]

[46] Yoon SH, Kweon H, Cho J, Kim S, Yoon KJ. Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 326–344. [doi: 10.1007/978-3-031-19818-2_19]

[47] Kweon H, Yoon SH, Yoon KJ. Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 11329–11339.

[48] Rong SH, Tu BH, Wang ZL, Li JJ. Boundary-enhanced co-training for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 19574–19584.

[49] Cheng ZS, Qiao PC, Li KH, Li SH, Wei PX, Ji XY, Yuan L, Liu C, Chen J. Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 23673–23684. [doi: 10.1109/CVPR52729.2023.02267]

引用本文

李军侠,苏京峰,崔滢,刘青山.基于语义调制的弱监督语义分割.软件学报,,():1-15

复制

文章指标

点击次数:87
下载次数: 471
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2023-09-08
最后修改日期:2024-01-11
录用日期:
在线发布日期: 2025-01-08
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码