Abstract:Deep convolutional neural networks have achieved excellent performance in image semantic segmentation with strong pixel-level annotations. However, pixel-level annotations are very expensive and time-consuming. To overcome this problem, this study proposes a new weakly supervised image semantic segmentation method with image-level annotations. The proposed method consists of three steps: (1) Based on the sharing network for classification and segmentation task, the class-specific attention map is obtained which is the derivative of the spatial class scores (the class scores of pixels in the two-dimensional image space) with respect to the network feature maps; (2) Saliency map is gotten by successive erasing method, which is used to supplement the object localization information missing by attention maps; (3) Attention map is combined with saliency map to generate pseudo pixel-level annotations and train the segmentation network. A series of comparative experiments demonstrate the effectiveness and better segmentation performance of the proposed method on the challenging PASCAL VOC 2012 image segmentation dataset.