End-to-end Speech Translation by Integrating Cross-modal Information

doi:10.13328/j.cnki.jos.006413

微信服务号

微信订阅号

Home > Archive>Volume 34, Issue 4, 2023 >1837-1849. DOI:10.13328/j.cnki.jos.006413

PDF HTML XML Export Cite reminder

End-to-end Speech Translation by Integrating Cross-modal Information
DOI:
                        10.13328/j.cnki.jos.006413
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Speech translation aims to translate the speech in one language into the speech or text in another language. Compared with the pipeline system, the end-to-end speech translation model has the advantages of low latency, less error propagation, and small storage, so it has attracted much attention. However, the end-to-end model not only requires to process the long speech sequence and extract the acoustic information, but also needs to learn the alignment relationship between the source speech and the target text, leading to modeling difficulty with poor performance. This study proposes an end-to-end speech translation model with cross-modal information fusion, which deeply combines text-based machine translation model with speech translation model. For the length inconsistency between the speech and the text, a redundancy filter is proposed to remove the redundant acoustic information, making the length of filtered acoustic representation consistent with the corresponding text. For learning the alignment relationship, the parameter sharing method is applied to embed the whole machine translation model into the speech translation model with multi-task training. Experimental results on public speech translation data sets show that the proposed method can significantly improve the model performance.

Reference

Cited by

Get Citation

刘宇宸,宗成庆.跨模态信息融合的端到端语音翻译.软件学报,2023,34(4):1837-1849

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 29,2020
Revised:March 13,2021
Adopted:
Online: July 15,2022
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History