Abstract:With the popularization of multimodal medical images in clinical diagnosis and treatment, fusion technology based on spatiotemporal correlation characteristics has been developed rapidly. The fused medical images not only retain the unique features of source images with various modalities but also strengthen the complementary information, which can facilitate image reading. At present, most methods perform feature extraction and feature fusion by manually defining constraints, which can easily lead to the loss of useful information and unclear details in the fused images. In light of this, a dual-adversarial fusion network using a pre-trained model for feature extraction is proposed in this study to fuse MR-T1/MR-T2 images. The network consists of a feature extraction module, a feature fusion module, and two discriminator network modules. Due to the small scale of the registered multimodal medical image dataset, the feature extraction network cannot be fully trained. In addition, as the pre-trained model has powerful data representation ability, a pre-trained convolutional neural network model is embedded into the feature extraction module to generate the feature map. Then, the feature fusion network fuses the deep features and outputs fused images. Through accurate classification of the source and fused images, the two discriminator networks establish adversarial relations with the feature fusion network separately and eventually encourage it to learn the optimal fusion parameters. The experimental results illustrate the effectiveness of pre-trained technology in this method. Compared with six existing typical fusion methods, the proposed method can generate the fused results of optimal performance in visual effects and quantitative metrics.