Abstract:The encoder-decoder network based on U-Net and its variants have achieved excellent performance in semantic segmentation of medical images. However, some spatial details are lost during feature extraction, which affects the accuracy of segmentation, and the generalization ability and robustness of these models are unsatisfactory. Therefore, this study proposes a deep convolutional encoder-decoder network with saliency guidance and uncertainty supervision to solve the semantic segmentation problem in multimodal medical images. In this method, the initially generated saliency map and the uncertainty probability map are used as the supervised information to optimize the parameters of the semantic segmentation network. Specifically, the saliency map is generated by the saliency detection network to preliminarily locate the target region in an image, and on this basis, the set of pixel points with uncertain classification is calculated to generate the uncertainty probability map. Then, the two maps are sent into the multi-scale feature fusion network together with the original image to guide the network to focus on the learning of the features in the target region and to enhance the representational capacity of regions with uncertain classification and complex boundaries. In this way, the segmentation performance of the network can be improved. The experimental results reveal that the proposed method can capture more semantic information and outperforms existing semantic segmentation methods in semantic segmentation of multimodal medical images, with strong generalization capability and robustness.