国家自然科学基金(61673084); 辽宁省自然科学基金(20170540192, 20180550866)
提出一种基于卷积神经网络的Transformer模型来解决全景分割任务, 方法借鉴CNN在图像特征学习方面的先天优势, 避免了Transformer被移植到视觉任务中所导致的计算量增加. 基于卷积神经网络的Transformer模型由执行特征域变换的映射器和负责特征提取的提取器这两种基本结构构成, 映射器和提取器的有效结合构成了该模型的网络框架. 映射器由一种Lattice卷积模型实现, 通过对卷积滤波器进行设计和优化来模拟图像的空间关系. 提取器由链式网络实现, 通过链式单元堆叠提高特征提取能力. 基于全景分割的结构和功能, 构建了基于CNN的全景分割Transformer网络. 在MS COCO和Cityscapes数据集的实验结果表明, 所提方法具有优异的性能.
This study proposes a convolutional neural network (CNN) based Transformer to solve the panoptic segmentation task. The method draws on the inherent advantages of the CNN in image feature learning and avoids increase in the amount of calculation when the Transformer is transplanted into the vision task. The CNN-based Transformer is attributed to the two basic structures of the projector performing the feature domain transformation and the extractor responsible for the feature extraction. The effective combination of the projector and the extractor forms the framework of the CNN-based Transformer. Specifically, the projector is implemented by a lattice convolution that models the spatial relationship of the image by designing and optimizing the convolution filter configuration. The extractor is performed by a chain network that improves feature extraction capabilities by chain block stacking. Considering the framework and the substantial function of panoptic segmentation, the CNN-based Transformer is successfully applied to solve the panoptic segmentation task. The experimental results on the MS COCO and Cityscapes datasets demonstrate that the proposed method has excellent performance.