Pre-training-driven Multimodal Boundary-aware Vision Transformer

doi:10.13328/j.cnki.jos.006768

微信服务号

微信订阅号

2025-6-5- 13

Home > Archive>Volume 34, Issue 5, 2023 >2051-2067. DOI:10.13328/j.cnki.jos.006768

PDF HTML XML Export Cite reminder

Pre-training-driven Multimodal Boundary-aware Vision Transformer
DOI:
                        10.13328/j.cnki.jos.006768
                    
Author:
                        SHI Ze-NanSHI Ze-Nan
College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (Jilin University), Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Hai-PengCHEN Hai-Peng
College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (Jilin University), Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG DongZHANG Dong
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN Xuan-JingSHEN Xuan-Jing
College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (Jilin University), Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Convolutional neural networks (CNN) have continuously achieved performance breakthroughs in image forgery detection, but when faced with realistic scenarios where the means of tampering is unknown, the existing methods are still unable to effectively capture the long-term dependencies of the input image to alleviate the recognition bias problem, which affects the detection accuracy. In addition, due to the difficulty in labeling, image forgery detection usually lacks accurate pixel-level image labeling information. Considering the above problems, this study proposes a pre-training-driven multimodal boundary-aware vision transformer. To capture the subtle forgery traces invisible in the RGB domain, the method first introduces the frequency-domain modality of the image and combines it with the RGB spatial domain as a form of multimodal embedding. Secondly, the encoder of the backbone network is trained with ImageNet to alleviate the current problem of insufficient training samples. Then, the transformer module is integrated into the tail of this encoder to capture both low-level spatial details and global contexts, which improves the overall representation ability of the model. Finally, to effectively alleviate the problem of difficult localization caused by the blurred boundary of the forged regions, this study establishes a boundary-aware module, which can use the noise distribution obtained by the Scharr convolutional layer to pay more attention to the noise information rather than the semantic content and utilize the boundary residual block to sharpen the boundary information. In this way, the boundary segmentation performance of the model can be enhanced. The results of extensive experiments show that the proposed method outperforms existing image forgery detection methods in terms of recognition accuracy and has better generalization and robustness to different forgery methods.

Key words:model pre-training;multimodal;vision Transformer;boundary awareness;image forgery detection

Get Citation

石泽男,陈海鹏,张冬,申铉京.预训练驱动的多模态边界感知视觉Transformer.软件学报,2023,34(5):2051-2067

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 15,2022
Revised:May 29,2022
Adopted:
Online: September 20,2022
Published: May 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History