基于语义调制的弱监督语义分割
作者:
中图分类号:

TP391

基金项目:

国家重点研发计划(2022YFC2405600); 国家自然科学基金(62272235, 62102364, U21B2044); 浙江省自然科学基金(LY22F020016)


Semantic-modulation-based Weakly Supervised Semantic Segmentation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [52]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    图像级标注下的弱监督语义分割方法通常采用卷积神经网络(CNN)生成类激活图以精确定位目标位置, 其面临的主要挑战在于CNN对全局信息感知能力的不足导致前景区域过小的问题. 近年来, 基于Transformer的弱监督语义分割方法利用自注意力机制捕捉全局依赖关系, 解决了CNN的固有缺陷. 然而, Transformer生成的初始类激活图会在目标区域周围引入大量背景噪声, 此时直接对初始类激活图进行使用并不能取得令人满意的效果. 通过综合利用Transformer生成的类与块间注意力(class-to-patch attention)以及区域块间注意力(patch-to-patch attention)对初始类激活图进行联合优化, 同时, 由于原始的类与块间注意力存在误差, 对此设计一种语义调制策略, 利用区域块间注意力的语义上下文信息对类与块间注意力进行调制, 修正其误差, 最终得到能够准确覆盖较多目标区域的类激活图. 在此基础上, 构建一种新颖的基于Transformer的弱监督语义分割模型. 所提方法在PASCAL VOC 2012验证集和测试集上mIoU值分别达到72.7%和71.9%, MS COCO 2014验证集上mIoU为42.3%, 取得了目前较为先进的弱监督语义分割结果.

    Abstract:

    Image-level weakly supervised semantic segmentation usually uses convolutional neural networks (CNNs) to generate class activation maps to accurately locate targets. However, CNNs have a limited capacity to perceive global information, which results in excessively narrow foregrounds. Recently, Transformer-based weakly supervised semantic segmentation has utilized self-attention mechanisms to capture global dependencies, addressing the inherent defects of CNNs. Nevertheless, the initial class activation map generated by a Transformer often introduces a lot of background noise around the target area, resulting in unsatisfactory performance if used directly. This study comprehensively utilizes both class-to-patch and patch-to-patch attention generated by a Transformer to optimize the initial class activation map. At the same time, a semantic modulation strategy is designed to correct errors in the class-to-patch attention, using the semantic context information of the patch-to-patch attention. Finally, a class activation map that accurately covers more target areas is obtained. On this basis, a novel model for weakly supervised semantic segmentation based on a Transformer is constructed. The mIoU of the proposed method reaches 72.7% and 71.9% on the PASCAL VOC 2012 validation and test sets, respectively, and 42.3% on the MS COCO 2014 validation set, demonstrating that the proposed method achieves improved performance in weakly supervised semantic segmentation.

    参考文献
    [1] 白琮, 黄玲, 陈佳楠, 潘翔, 陈胜勇. 面向大规模图像分类的深度卷积神经网络优化. 软件学报, 2018, 29(4): 1029–1038. http://www.jos.org.cn/1000-9825/5404.htm
    Bai C, Huang L, Chen JN, Pan X, Chen SY. Optimization of deep convolutional neural network for large scale image classification. Ruan Jian Xue Bao/Journal of Software, 2018, 29(4): 1029–1038 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5404.htm
    [2] 田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述. 软件学报, 2019, 30(2): 440–468. http://www.jos.org.cn/1000-9825/5659.htm
    Tian X, Wang L, Ding Q. Review of image semantic segmentation based on deep learning. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 440–468 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5659.htm
    [3] Khoreva A, Benenson R, Hosang J, Hein M, Schiele B. Simple does it: Weakly supervised instance and semantic segmentation. In: Proc. of the 2017 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 876–885.
    [4] Lin D, Dai JF, Jia JY, He KM, Sun J. ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In: Proc. of the 2016 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 3159–3167.
    [5] Bearman A, Russakovsky O, Ferrari V, Fei-Fei L. What’s the point: Semantic segmentation with point supervision. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 549–565. [doi: 10.1007/978-3-319-46478-7_34]
    [6] Wang YD, Zhang J, Kan MN, Shan SG, Chen XL. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12275–12284. [doi: 10.1109/CVPR42600.2020.01229]
    [7] Ahn J, Kwak S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4981–4990.
    [8] Chang YT, Wang QS, Hung WC, Piramuthu R, Tsai YH, Yang MH. Weakly-supervised semantic segmentation via sub-category exploration. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 8988–8997. [doi: 10.1109/CVPR42600.2020.00901]
    [9] Zhou BL, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2921–2929. [doi: 10.1109/CVPR.2016.319]
    [10] Chen ZW, Wang CA, Wang YB, Jiang GN, Shen YH, Tai Y, Wang CJ, Zhang W, Cao LJ. LCTR: On awakening the local continuity of Transformer for weakly supervised object localization. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2020. 710–718.
    [11] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on Learning Representations. OpenReview.net, 2021.
    [12] 石泽男, 陈海鹏, 张冬, 申铉京. 预训练驱动的多模态边界感知视觉Transformer. 软件学报, 2023, 34(5): 2051–2067. http://www.jos.org.cn/1000-9825/6768.htm
    Shi ZN, Chen HP, Zhang D, Shen XJ. Pre-training-driven multimodal boundary-aware vision Transformer. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2051–2067 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6768.htm
    [13] Gao W, Wan F, Pan XJ, Peng ZL, Tian Q, Han ZJ, Zhou BL, Ye QX. TS-CAM: Token semantic coupled attention map for weakly supervised object localization. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 2866–2875. [doi: 10.1109/ICCV48922.2021.00288]
    [14] Xu L, Ouyang WL, Bennamoun M, Boussaid F, Xu D. Multi-class token Transformer for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4300–4309.
    [15] Wei YC, Feng JS, Liang XD, Cheng MM, Zhao Y, Yan SC. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6488–6496. [doi: 10.1109/CVPR.2017.687]
    [16] Jiang PT, Hou QB, Cao Y, Cheng MM, Wei YC, Xiong HK. Integral object mining via online attention accumulation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 2070–2079. [doi: 10.1109/ICCV.2019.00216]
    [17] Sun GL, Wang WG, Dai JF, van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 347–365. [doi: 10.1007/978-3-030-58536-5_21]
    [18] Zhang F, Gu CC, Zhang CY, Dai YC. Complementary patch for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7222–7231. [doi: 10.1109/ICCV48922.2021.00715]
    [19] Jiang PT, Yang YQ, Hou QB, Wei YC. L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16865–16875. [doi: 10.1109/CVPR52688.2022.01638]
    [20] Qin J, Wu J, Xiao XF, Li LJ, Wang XG. Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 2117–2125.
    [21] Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2209–2218. [doi: 10.1109/CVPR.2019.00231]
    [22] Ru LX, Zhan YB, Yu BS, Du B. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with Transformers. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16825–16834. [doi: 10.1109/CVPR52688.2022.01634]
    [23] Ru LX, Zheng HL, Zhan YB, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3093–3102. [doi: 10.1109/CVPR52729.2023.00302]
    [24] Li RW, Mai ZD, Zhang ZB, Jang J, Sanner S. TransCAM: Transformer attention-based CAM refinement for weakly supervised semantic segmentation. Journal of Visual Communication and Image Representation, 2023, 92: 103800.
    [25] Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image Transformers & distillation through attention. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 10347–10357.
    [26] Wu ZF, Shen CH, van den Hengel A. Wider or deeper: Revisiting the ResNet model for visual recognition. Pattern Recognition, 2019, 90: 119–133.
    [27] Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. Int’l Journal of Computer Vision, 2010, 88(2): 303–338.
    [28] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In: Proc. of the 13th European Conf. on Computer Vision. Zurich: Springer, 2014. 740–755. [doi: 10.1007/978-3-319-10602-1_48]
    [29] Lin YQ, Chen MH, Wang WX, Wu BX, Li K, Lin BB, Liu HF, He XF. CLIP is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 15305–15314. [doi: 10.1109/CVPR52729.2023.01469]
    [30] Chen ZZ, Sun QR. Extracting class activation maps from non-discriminative features as well. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3135–3144. [doi: 10.1109/CVPR52729.2023.00306]
    [31] Xie JH, Hou XX, Ye K, Shen LL. CLIMS: Cross language image matching for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4473–4482.
    [32] Wu T, Huang JS, Gao GY, Wei XM, Wei XL, Luo X, Liu CH. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 16765–16774. [doi: 10.1109/CVPR46437.2021.01649]
    [33] Lee J, Kim E, Yoon S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 4071–4080.
    [34] Li Y, Kuang ZH, Liu LY, Chen YM, Zhang W. Pseudo-mask matters in weakly-supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 6964–6973. [doi: 10.1109/ICCV48922.2021.00688]
    [35] Jo S, Yu IJ, Kim K. MARS: Model-agnostic biased object removal without additional supervision for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 614–623.
    [36] Lee S, Lee M, Lee J, Shim H. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5495–5505.
    [37] Sun WX, Zhang J, Barnes N. Inferring the class conditional response map for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2022. 2653–2662.
    [38] Zhou TF, Zhang MJ, Zhao F, Li JW. Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4299–4309.
    [39] Su YK, Sun RZ, Lin GS, Wu QY. Context decoupling augmentation for weakly supervised semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7004–7014. [doi: 10.1109/ICCV48922.2021.00692]
    [40] Li Y, Duan YQ, Kuang ZH, Chen YM, Zhang W, Li XM. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 1447–1455.
    [41] Chen ZZ, Wang T, Wu XW, Hua XS, Zhang HW, Sun QR. Class re-activation maps for weakly-supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 959–968.
    [42] Lee M, Kim D, Shim H. Threshold matters in WSSS: Manipulating the activation for the robust and accurate segmentation model against thresholds. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4330–4339. [doi: 10.1109/CVPR52688.2022.00429]
    [43] Lee J, Oh SJ, Yun S, Choe J, Kim E, Yoon S. Weakly supervised semantic segmentation using out-of-distribution data. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 16897–16906.
    [44] Chen Q, Yang LX, Lai JH, Xie XH. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4288–4298. [doi: 10.1109/CVPR52688.2022.00425]
    [45] Rossetti S, Zappia D, Sanzari M, Schaerf M, Pirri F. Max pooling with vision Transformers reconciles class and shape in weakly supervised semantic segmentation. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 446–463. [doi: 10.1007/978-3-031-20056-4_26]
    [46] Yoon SH, Kweon H, Cho J, Kim S, Yoon KJ. Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 326–344. [doi: 10.1007/978-3-031-19818-2_19]
    [47] Kweon H, Yoon SH, Yoon KJ. Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 11329–11339.
    [48] Rong SH, Tu BH, Wang ZL, Li JJ. Boundary-enhanced co-training for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 19574–19584.
    [49] Cheng ZS, Qiao PC, Li KH, Li SH, Wei PX, Ji XY, Yuan L, Liu C, Chen J. Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 23673–23684. [doi: 10.1109/CVPR52729.2023.02267]
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李军侠,苏京峰,崔滢,刘青山.基于语义调制的弱监督语义分割.软件学报,,():1-15

复制
分享
文章指标
  • 点击次数:87
  • 下载次数: 471
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2023-09-08
  • 最后修改日期:2024-01-11
  • 在线发布日期: 2025-01-08
文章二维码
您是第19892678位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号