跨模态信息融合的端到端语音翻译

doi:10.13328/j.cnki.jos.006413

微信服务号

微信订阅号

2025年7月15日 11:44 星期二

首页 > 过刊浏览>2023年第34卷第4期 >1837-1849. DOI:10.13328/j.cnki.jos.006413

PDF HTML阅读 XML下载导出引用引用提醒

跨模态信息融合的端到端语音翻译
DOI:
                        10.13328/j.cnki.jos.006413
                    
CSTR:
                        
                    
作者:
                        刘宇宸刘宇宸
模式识别国家重点实验室 (中国科学院自动化研究所), 北京 100190;中国科学院大学 人工智能学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
宗成庆宗成庆
模式识别国家重点实验室 (中国科学院自动化研究所), 北京 100190;中国科学院大学 人工智能学院, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:刘宇宸(1995-),男,博士,主要研究领域为自然语言处理,机器翻译,语音翻译;宗成庆(1963-),男,博士,研究员,CCF会士,主要研究领域为自然语言处理,机器翻译,文本数据挖掘,语言认知计算.
通讯作者:
中图分类号:
基金项目:国家自然科学基金重点项目(U1836221)

End-to-end Speech Translation by Integrating Cross-modal Information

Author:

LIU Yu-Chen
LIU Yu-Chen
National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
ZONG Cheng-Qing
ZONG Cheng-Qing
National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China;School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

语音翻译旨在将一种语言的语音翻译成另一种语言的语音或文本. 相比于级联式翻译系统, 端到端的语音翻译方法具有时间延迟低、错误累积少和存储空间小等优势, 因此越来越多地受到研究者们的关注. 但是, 端到端的语音翻译方法不仅需要处理较长的语音序列, 提取其中的声学信息, 而且需要学习源语言语音和目标语言文本之间的对齐关系, 从而导致建模困难, 且性能欠佳. 提出一种跨模态信息融合的端到端的语音翻译方法, 该方法将文本机器翻译与语音翻译模型深度结合, 针对语音序列长度与文本序列长度不一致的问题, 通过过滤声学表示中的冗余信息, 使过滤后的声学状态序列长度与对应的文本序列尽可能一致; 针对对齐关系难学习的问题, 采用基于参数共享的方法将文本机器翻译模型嵌入到语音翻译模型中, 并通过多任务训练方法学习源语言语音与目标语言文本之间的对齐关系. 在公开的语音翻译数据集上进行的实验表明, 所提方法可以显著提升语音翻译的性能.

关键词:语音翻译;神经机器翻译;端到端模型;多模态学习

Abstract:

Speech translation aims to translate the speech in one language into the speech or text in another language. Compared with the pipeline system, the end-to-end speech translation model has the advantages of low latency, less error propagation, and small storage, so it has attracted much attention. However, the end-to-end model not only requires to process the long speech sequence and extract the acoustic information, but also needs to learn the alignment relationship between the source speech and the target text, leading to modeling difficulty with poor performance. This study proposes an end-to-end speech translation model with cross-modal information fusion, which deeply combines text-based machine translation model with speech translation model. For the length inconsistency between the speech and the text, a redundancy filter is proposed to remove the redundant acoustic information, making the length of filtered acoustic representation consistent with the corresponding text. For learning the alignment relationship, the parameter sharing method is applied to embed the whole machine translation model into the speech translation model with multi-task training. Experimental results on public speech translation data sets show that the proposed method can significantly improve the model performance.

Key words:speech translation;neural machine translation;end-to-end model;multi-modal learning

引用本文

刘宇宸,宗成庆.跨模态信息融合的端到端语音翻译.软件学报,2023,34(4):1837-1849

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-12-29
最后修改日期:2021-03-13
录用日期:
在线发布日期: 2022-07-15
出版日期: 2023-04-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码