以语音出现时频相关性为基础的语音掩模估计
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(11461141004,91120001,61271426);中国科学院战略性先导科技专项(XDA06030100,XDA06030500);国家高技术研究发展计划(863)(2012AA012503);中国科学院重点部署项目(KGZD-EW-103-2)


Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (11461141004, 91120001, 61271426); Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030100, XDA06030500); National High-Tech R&D Program of China (863) (2012 AA012503); CAS Priority Deployment Project (KGZD-EW-103-2)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性.

    Abstract:

    This paper proposes a method to estimate the spectrographic speech mask based on a two-dimensional (2-D) correlation model. The proposed method is motivated by a fact that the time and frequency correlations of speech presence are interwoven with each other in the time-frequency domain. Conventional Markov chain is incapable of simultaneously modeling the time and frequency correlations in an adaptive way. The 2-D correlation model is presented to describe the correlation of speech presence in the TF domain, where the speech presence and absence are taken as two states of the model. The time correlation is modeled by the time state-transition probability and the forward factor, while the frequency state-transition probability and the corresponding neighbor factor are defined to describe the frequency correlation. The time and frequency correlations are incorporated into the model by maximizing the Q-function. A sequential scheme is presented to online estimate the parameter set. Given the observed spectrum and the parameter set, the state matrix that maximizes the posteriori probability is regarded as the optimal estimate of the speech mask. The proposed method was compared with some well-established methods. The experimental results confirmed its superiority.

    参考文献
    相似文献
    引证文献
引用本文

战鸽,黄兆琼,应冬文,潘接林,颜永红.以语音出现时频相关性为基础的语音掩模估计.软件学报,2016,27(S2):64-68

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-06-01
  • 最后修改日期:2016-01-05
  • 录用日期:
  • 在线发布日期: 2017-01-10
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号