基于网格切分的单阶段实例分割方法
作者:
作者简介:

王文海(1994-),男,博士,CCF学生会员,主要研究领域为场景文字检测,实例分割,神经网络;李志琦(1998-),男,博士生,主要研究领域为实例分割,全景分割;路通(1976-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为计算机视觉,场景文本检测

通讯作者:

路通,lutong@nju.edu.cn

中图分类号:

TP391

基金项目:

国家自然科学基金(61672273,61832008)


Grid Dividing for Single-stage Instance Segmentation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [38]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    近年来,与二阶段实例分割方法相比,单阶段实例分割方法由于实时性强,已在实际应用中取得了初步进展,但目前仍然存在以下两个主要缺点.(1)精度较低:单阶段方法缺少多轮优化环节,因此其精度离实际应用仍存在差距;(2)不够灵活:目前大多数单阶段方法是独立设计的,难以兼容实际应用中不同类型的物体检测框架,因此适用范围相对有限.提出了一种精确且灵活的单阶段实例分割框架——网格实例分割方法(GridMask),其中两个关键步骤如下:(1)为了提高实例分割精度,提出了一种网格切分二值化算法,将物体边界框内的区域划分为多个独立的网格,然后在每个网格上进行实例分割.该步骤将物体分割任务简化成了多个网格切片的分割,有效降低了特征表示的复杂程度,进而提高了实例分割的精度;(2)为了兼容不同的物体检测方法,设计了一个可以即插即用的子网络模块.该模块可以无缝地接入到目前大多数主流物体检测框架中,以增强这些方法的分割性能.所提方法在公共数据集MS COCO上取得了出色的性能,优于现有的大部分单阶段方法,甚至一些二阶段方法.

    Abstract:

    In recent years, single-stage instance segmentation methods have made preliminary progress in real-world applications due to their high efficiency, but there are still two drawbacks compared to two-stage counterparts. (1) Low accuracy: the single-stage method does not have multiple rounds of refinement, so its accuracy is some distance away from real-world applications; (2) Low flexibility: most existing single-stage methods are specifically designed models, which are not compatible with object detectors. This study presents an accurate and flexible framework for single-stage instance segmentation, which contains the following two key designs. (1) To improve the accuracy of instance segmentation, a grid dividing binarization algorithm is proposed, where the bounding box region is firstly divided into several grid cells and then instance segmentation is performed on each grid cell. In this way, the original full-object segmentation task is simplified into the sub-tasks of grid cells, which significantly reduces the complexity of feature representation and further improves the instance segmentation accuracy; (2) To be compatible with object detectors, a plug-and-play module is designed, which can be seamlessly plugged into most existing object detection methods, thus enabling them to perform instance segmentation. The proposed method achieves excellent performance on the public dataset, such as MS COCO. It outperforms most existing single-stage methods and even some two-stage methods.

    参考文献
    [1] Jiang F, Gu Q, Hao HZ, Li N, Guo YW, Chen DX. Survey on content-based image segmentation methods. Ruan Jian Xue Bao/Journal of Software, 2017, 28(1): 160–183 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5136.htm 姜枫, 顾庆, 郝慧珍, 李娜, 郭延文, 陈道蓄. 基于内容的图像分割方法综述. 软件学报, 2017, 28(1): 160–183. http://www.jos.org.cn/1000-9825/5136.htm
    [2] Yao RQ, Tang JF, Yu JQ, Wang ZK. Player detection and segmentation in sports video. Ruan Jian Xue Bao/Journal of Software, 2015, 26: 155–164 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/15026.htm 姚沁汝, 唐九飞, 于俊清, 王赠凯. 体育视频中的运动员检测与分割. 软件学报, 2015, 26: 155–164. http://www.jos.org.cn/1000-9825/15026.htm
    [3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego: ICLR, 2015.
    [4] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [5] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2261–2269.
    [6] Zhang S, Gong YH, Wang JJ. The development of deep convolution neural network and its applications on computer vision. Chinese Journal of Computers, 2019, 42(3): 453–482 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2019.00453] 张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用. 计算机学报, 2019, 42(3): 453–482. [doi: 10.11897/SP.J.1016.2019.00453]
    [7] He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2980–2988.
    [8] Liu S, Qi L, Qin HF, Shi JP, Jia JY. Path aggregation network for instance segmentation. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8759–8768.
    [9] Huang ZJ, Huang LC, Gong YC, Huang C, Wang XG. Mask scoring R-CNN. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 6402–6411.
    [10] Ren SQ, He KM, Girshick RB, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proc. of the 2015 Advances in Neural Information Processing Systems. Montreal: NIPS, 2015. 91–99.
    [11] Girshick R. Fast R-CNN. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1440–1448.
    [12] Bai M, Urtasun R. Deep watershed transform for instance segmentation. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2858–2866.
    [13] Bolya D, Zhou C, Xiao FY, Lee YJ. YOLACT: Real-time instance segmentation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9156–9165.
    [14] Chen XL, Girshick R, He KM, Dollar P. TensorMask: A foundation for dense object segmentation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 2061–2069.
    [15] Dai JF, He KM, Li Y, Ren SQ, Sun J. Instance-sensitive fully convolutional networks. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 534–549.
    [16] Xu WQ, Wang HY, Qi FB, Lu CW. Explicit shape encoding for real-time instance segmentation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 5167–5176.
    [17] Wang WH, Xie EZ, Song XG, Zang YH, Wang WJ, Lu T, Yu G, Shen CH. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 8439–8448.
    [18] Wang WH, Xie EZ, Li X, Liu XB, Liang D, Yang ZB, Lu T, Shen CH. PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021.
    [19] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In: Proc. of the 13th European Conf. on Computer Vision. Zurich: Springer, 2014. 740–755.
    [20] Xie SN, Girshick R, Dollár P, Tu ZW, He KM. Aggregated residual transformations for deep neural networks. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 5987–5995.
    [21] Wang ZY, Yuan C, Li JC. Instance segmentation with separable convolutions and multi-level features. Ruan Jian Xue Bao/Journal of Software, 2019, 30(4): 954–961 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5667.htm 王子愉, 袁春, 黎健成. 利用可分离卷积和多级特征的实例分割. 软件学报, 2019, 30(4): 954–961. http://www.jos.org.cn/1000-9825/5667.htm
    [22] Li Y, Qi HZ, Dai JF, Ji XY, Wei YC. Fully convolutional instance-aware semantic segmentation. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 4438–4446.
    [23] Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 936–944.
    [24] Xie EZ, Sun PZ, Song XG, Wang WH, Liu XB, Liang D, Shen CH, Luo P. PolarMask: Single shot instance segmentation with polar representation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12190–12199.
    [25] Xie EZ, Wang WH, Ding MY, Zhang RM, Luo P. PolarMask++: Enhanced polar representation for single-shot instance segmentation and beyond. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021.
    [26] Lin TY, Goyal P, Girshick R, He KM, Dollár P. Focal loss for dense object detection. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2999–3007.
    [27] Tian Z, Shen CH, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9626–9635.
    [28] LeCun Y, Haffner P, Bottou L, Bengio Y. Object recognition with gradient-based learning. In: Forsyth DA, Mundy JL, Di Gesú V, Cipolla R, eds. Shape, Contour and Grouping in Computer Vision. Berlin: Springer, 1999. 319–345.
    [29] Wu YX, He KM. Group normalization. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 3–19.
    [30] Wang SR, Gong YC, Xing JL, Huang LC, Huang C, Hu WM. RDSNet: A new deep architecture forreciprocal object detection and instance segmentation. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI, 2020. 12208–12215.
    [31] Wang XL, Kong T, Shen CH, Jiang YN, Li L. SOLO: Segmenting objects by locations. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 649–665.
    [32] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. ImageNet: A large-scale hierarchical image database. In: Proc. of the 2009 IEEE Conf. on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255.
    [33] Caesar H, Uijlings J, Ferrari V. COCO-stuff: Thing and stuff classes in context. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1209–1218.
    [34] Cao JL, Anwer RM, Cholakkal H, Khan FS, Pang YW. SipMask: Spatial information preservation for fast image and video instance segmentation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 1–18.
    [35] Tian Z, Shen CH, Chen H. Conditional convolutions for instance segmentation. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 282–298.
    [36] Kirillov A, Girshick R, He KM, Dollár P. Panoptic feature pyramid networks. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 6392–6401.
    [37] Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY. Pyramid scene parsing network. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6230–6239.
    [38] Dai JF, He KM, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 3150–3158.
    相似文献
    引证文献
引用本文

王文海,李志琦,路通.基于网格切分的单阶段实例分割方法.软件学报,2023,34(6):2906-2921

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-04
  • 最后修改日期:2021-07-23
  • 在线发布日期: 2022-10-14
  • 出版日期: 2023-06-06
文章二维码
您是第19985812位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号