RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness

doi:10.13328/j.cnki.jos.006833

微信服务号

微信订阅号

2025-4-13- 3

Home > Archive>Volume 35, Issue 4, 2024 >1899-1913. DOI:10.13328/j.cnki.jos.006833

PDF HTML XML Export Cite reminder

RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness
DOI:
                        10.13328/j.cnki.jos.006833
                    
Author:
                        SUN Fu-MingSUN Fu-Ming
School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HU Xi-HangHU Xi-Hang
School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Jing-YuWU Jing-Yu
School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SUN JingSUN Jing
School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Fa-ShengWANG Fa-Sheng
School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Related [20]

Cited by

Materials

Comments

Abstract:

In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of its rich geometric structure and spatial position information in depth maps and thus has been highly concerned by the academic community. However, the existing RGB-D detection model still faces the challenge of improving performance continuously. The emerging Transformer is good at modeling global information, while the convolutional neural network (CNN) is good at extracting local details. Therefore, effectively combining the advantages of CNN and Transformer to mine global and local information will help to improve the accuracy of salient object detection. For this purpose, an RGB-D salient object detection method based on cross-modal interactive fusion and global awareness is proposed in this study. The transformer network is embedded into U-Net to better extract features by combining the global attention mechanism with local convolution. First, with the help of the U-Net encoder-decoder structure, this study efficiently extracts multi-level complementary features and decodes them step by step to generate a salient feature map. Then, the Transformer module is used to learn the global dependency between high-level features to enhance the feature representation, and the progressive upsampling fusion strategy is used to process the input and reduce the introduction of noise information. Moreover, to reduce the negative impact of low-quality depth maps, the study also designs a cross-modal interactive fusion module to realize cross-modal feature fusion. Finally, experimental results on five benchmark datasets show that the proposed algorithm has an excellent performance than other latest algorithms.

Key words:salient object detection (SOD);cross-modal;global attention mechanism;RGB-D detection model

Get Citation

孙福明,胡锡航,武景宇,孙静,王法胜.跨模态交互融合与全局感知的RGB-D显著性目标检测.软件学报,2024,35(4):1899-1913

Copy

Article Metrics

Abstract:1055
PDF: 2534
HTML: 818
Cited by: 0

History

Received:June 29,2022
Revised:September 01,2022
Adopted:
Online: June 14,2023
Published: April 06,2024

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History