• Article
  • | |
  • Metrics
  • |
  • Reference [14]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    With the development of multimedia technology, the use of video has increased in many fields, and captions are frequently inserted into video images to aid the understanding of audience. This paper proposes a robust endpoint detection algorithm for continuous speech in noisy environment, and it can be used in automatic video caption generation systems. In the proposed algorithm, we integrate the widely used energy, zero crossing and entropy to form a new feature, EZE-feature, which possesses advantages while compensating the drawbacks of each individual. Moreover, an adaptive endpoint detection method is proposed which makes the EZE-feature modify its environment parameters by adapting to the strength of background noise. The proposed algorithm has been used in an automatic video caption generation system, and the performance of the algorithm is very well.

    Reference
    [1] Evangelopoulos G, Maragos P. Multiband modulation energy tracking for noisy speech detection. IEEE Trans. on Audio, Speech and Language Process, 2006,14(6):2024-2038.
    [2] Junqua JC, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise. IEEE Trans. on Speech and Audio Process, 2004,2(3):406?412.
    [3] Koichi Y, Firas J, Klaus R, Akinori K. Robust endpoint detection for speech recognition based on discriminative feature extraction. In: Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2006. 805?808.
    [4] Li Q, Zheng J, Tsai A, Zhou Q. Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. on Speech Audio Process, 2002,10(3):146-157.
    [5] Li Q, Zheng J, Zhou Q, Lee CH. A robust, real-time endpoint detector with energy normalization for ASR in adverse environments. In: Proc. of the IEEE Int'l Conf. Acoust. Speech, Signal Process. 2001. 233-236.
    [6] Wu BF, Wang KC. Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Trans. on Speech Audio Process, 2005,13(5):762-775.
    [7] Yamamoto K, Jabloun F. Robust endpoint detection for speech recognition based on discriminative feature. In: Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2006. 114-119.
    [8] Zhang X, Li G, Qiao F. A speech endpoint detection algorithm based on entropy and RBF neural network. In: Proc. of the IEEE Int'l Conf. on Granular Computing. 2007. 506-509.
    [9] Liu HP, Li X, Zheng Y, Xu BL, Jiang N. Speech endpoint detection based on improved adaptive band-partitioning spectral entropy. Journal of System Simulation, 2008,20(5):1366?1371 (in Chinese with abstract English).
    [10] Yan BF, Zhu XY, Zhang ZJ, Zhang F. Robust speech recognition based on neighborhood space. Journal of Software, 2007,18(4):878?883 (in Chinese with abstract English). http://www.jos.org.cn/1000-9825/18/878.htm
    [11] Tang Y, Liu WJ, Xu B. Mandarin digit string recognition based on segment model using posterior probability decoding. Chinese Journal of Computers, 2006,29(4):635?641 (in Chinese with abstract English). 附中文参考文献:
    [9] 刘华平,李昕,郑宇,徐柏龄,姜宁.一种改进的自适应子带谱熵语音端点检测方法.系统仿真学报,2008,20(5):1366?1371.
    [10] 严斌峰,朱小燕.基于邻接空间的鲁棒语音识别方法.软件学报,2007,18(4):878?883. http://www.jos.org.cn/1000-9825/18/878.htm
    [11] 唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别.计算机学报,2006,29(4):635-641.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李 祺,马华东,冯 硕.用于自动字幕生成系统的语音端点检测算法.软件学报,2008,19(zk):96-103

Copy
Share
Article Metrics
  • Abstract:4693
  • PDF: 7254
  • HTML: 0
  • Cited by: 0
History
  • Received:May 01,2008
  • Revised:November 25,2008
You are the first2034815Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063