A Cross-modal Privileged Information Enhanced Method for Image Classification

微信服务号

微信订阅号

2025-5-3- 5

A Cross-modal Privileged Information Enhanced Method for Image Classification
DOI:
                        
                    
Author:
                        Li Xiangxian,Zheng Yuze,Ma Haokai,Qi Zhuang,Yan Xiaoshuo,Meng xiangxu,Meng LeiLi Xiangxian,Zheng Yuze,Ma Haokai,Qi Zhuang,Yan Xiaoshuo,Meng xiangxu,Meng Lei

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:The Excellent Youth Scholars Program of Shandong Province;the National Natural Science Foundation of China;the National Key R&D Program of China

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The performance of image classification is limited by the diversity of visual information and the influence of background noise. Existing works usually applies cross-modal constraints or heterogeneous alignment algorithms to learn visual representations with strong discrimination. However, the difference in feature distribution from different sources limit the effective learning of visual representations. To address this problem, this paper proposes an image classification framework CMIF based on cross-modal semantic information inference and fusion, and introduces the semantic description of images and statistical knowledge as privileged information, using the privileged information learning paradigm to guide the mapping of image features from visual space to semantic space in the training stage, a class-aware information selection algorithm (CIS) is proposed to learn the cross-modal enhanced representation of images. Aiming at the heterogeneous feature differences in representation learning problem, using Partial Heterogeneous Alignment algorithm (PHA) to achieve cross-modal alignment of visual features and semantic features extracted from privileged information. In order to further suppress the noise from visual space to semantic space, the CIS algorithm selection based on graph fusion reconstructs the key information in the semantic representation to form an effective supplement to the visual prediction information. Experiments on the cross-modal classification datasets VireoFood-172 and NUS-WIDE show that CMIF can learn robust visual-semantic features, and it has achieved stable performance improvement on the convolution-based ResNet-50 and Transform-based ViT image classification models.

Key words:image classification; cross-modal learning; privileged information

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 06,2022
Revised:July 10,2023
Adopted:September 24,2023
Online:
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History