Distant Speech Recognition Based on Knowledge Distillation and Generative Adversarial Network

微信服务号

微信订阅号

2025-4-24- 22

Home > Archive>Volume 30, Issue S2, 2019 >25-34

PDF HTML XML Export Cite reminder

Distant Speech Recognition Based on Knowledge Distillation and Generative Adversarial Network
DOI:
                        
                    
Author:
                        WU LongWU Long
Key Laboratory of Speech Acoustics and Content Understanding(Institute of Acoustics, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beiing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI TaLI Ta
Key Laboratory of Speech Acoustics and Content Understanding(Institute of Acoustics, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG LiWANG Li
Key Laboratory of Speech Acoustics and Content Understanding(Institute of Acoustics, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YAN Yong-HongYAN Yong-Hong
Key Laboratory of Speech Acoustics and Content Understanding(Institute of Acoustics, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beiing 100049, China;Xinjiang Laboratory of Minority Speech and Language Information Processing(Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences), Urumqi 830011, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (11590774, 11590770); Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1); IACAS Young Elite Researcher Project (QNYC201602)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In order to further utilize near-field speech data to improve the performance of far-field speech recognition, this paper proposes an approach to integrate knowledge distillation with the generative adversarial network. In this work, a multi-task learning structure is firstly proposed to jointly train the acoustic model with feature mapping. To enhance the acoustic modeling, the acoustic model trained with far-field data (student model) is guided by an acoustic model trained with near-field data (teacher model). Such training process makes the student model mimics the behavior of the teacher model by minimizing the Kullback-Leibler Divergence. To improve the speech enhancement, an additional discriminator network is introduced to distinguish the enhanced features from the real clean ones. The distribution of the enhanced features is further pushed towards that of the clean features through this adversarial multi-task training. Evaluated on AMI single distant microphone data, the method achieves 5.6% relative non-overlapped word error rate (WER) and 4.7% relative overlapped WER decrease over the baseline model. Evaluated on AMI multi-channel distant microphone data, the method achieves 6.2% relative non-overlapped WER and 4.1% relative overlapped WER decrease over the baseline model. Evaluated on the TIMIT data, the method can reach 7.2% WER reduction. To better demonstrate the effects of generative adversarial network on speech enhancement, the enhanced features is visualized and the effectiveness of this method is verified.

Key words:distant speech recognition;knowledge distillation;generative adversarial network;multi-task learning;speech enhancement

Get Citation

邬龙,黎塔,王丽,颜永红.基于知识蒸馏和生成对抗网络的远场语音识别.软件学报,2019,30(S2):25-34

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 15,2019
Revised:
Adopted:
Online: January 02,2020
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History