Self-supervised Graph Contrastive Learning for Video Question Answering

doi:10.13328/j.cnki.jos.006775

微信服务号

微信订阅号

2025-4-24- 8

Home > Archive>Volume 34, Issue 5, 2023 >2083-2100. DOI:10.13328/j.cnki.jos.006775

PDF HTML XML Export Cite reminder

Self-supervised Graph Contrastive Learning for Video Question Answering
DOI:
                        10.13328/j.cnki.jos.006775
                    
Author:
                        YAO XuanYAO Xuan
National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GAO Jun-YuGAO Jun-Yu
National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XU Chang-ShengXU Chang-Sheng
National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China;Pengcheng Laboratory, Shenzhen 518055, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

As a cross-modal understanding task, video question answering (VideoQA) requires the interaction of semantic information with different modalities to generate answers to questions given a video and the questions associated with it. In recent years, graph neural networks (GNNs) have made remarkable progress in VideoQA tasks due to their powerful capabilities in cross-modal information fusion and inference. However, most existing GNN approaches fail to improve the performance of VideoQA models due to their inherent deficiencies of overfitting or over-smoothing, as well as weak robustness and generalization. In view of the effectiveness and robustness of self-supervised contrastive learning methods in pre-training techniques, this study proposes a self-supervised graph contrastive learning framework GMC based on the idea of graph data augmentation in VideoQA tasks. The framework uses two independent data augmentation operations for nodes and edges to generate dissimilar subsamples and improves the consistency between predicted graph data distributions of the original samples and augmented subsamples for higher accuracy and robustness of the VideoQA models. The effectiveness of the proposed framework is verified by experimental comparisons with existing state-of-the-art VideoQA models and different GMC variants on the public dataset for VideoQA tasks.

Key words:graph contrastive learning;video question answering;graph data augmentation;pre-training

Get Citation

姚暄,高君宇,徐常胜.基于自监督图对比学习的视频问答方法.软件学报,2023,34(5):2083-2100

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 18,2022
Revised:May 29,2022
Adopted:
Online: September 20,2022
Published: May 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History