Survey on Multimodal Information Extraction Research

doi:10.13328/j.cnki.jos.007245

微信服务号

微信订阅号

Home > Archive>Volume 36, Issue 4, 2025 >1665-1691. DOI:10.13328/j.cnki.jos.007245

PDF HTML XML Export Cite reminder

Survey on Multimodal Information Extraction Research
DOI:
                        10.13328/j.cnki.jos.007245
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Multimodal information extraction is a task to extract structured knowledge from unstructured or semi-structured multimodal data (such as text and images). It includes multimodal named entity recognition, multimodal relation extraction, and multimodal event extraction. This study analyzes multimodal information extraction tasks and summarizes the common part of the above three subtasks, i.e., a multimodal representation and fusion module. Moreover, it sorts out the commonly used datasets and mainstream research methods of the above three subtasks. Finally, it outlines research trends in multimodal information extraction and analyzes the existing problems and challenges in this field to provide a reference for future research.

Reference

Cited by

Get Citation

王永胜,李培峰,王中卿,朱巧明.多模态信息抽取研究综述.软件学报,2025,36(4):1665-1691

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 13,2023
Revised:February 25,2024
Adopted:
Online: December 09,2024
Published: April 06,2025

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History