A Robust Endpoint Detection Algorithm for Video Caption Generation

微信服务号

微信订阅号

2025-4-13- 9

Home > Archive>Volume 19, Issue zk, 2008 >96-103

A Robust Endpoint Detection Algorithm for Video Caption Generation
DOI:
                        
                    
Author:
                        LI QiLI Qi

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Hua-DongMA Hua-Dong

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FENG ShuoFENG Shuo

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [14]

Cited by

Materials

Comments

Abstract:

With the development of multimedia technology, the use of video has increased in many fields, and captions are frequently inserted into video images to aid the understanding of audience. This paper proposes a robust endpoint detection algorithm for continuous speech in noisy environment, and it can be used in automatic video caption generation systems. In the proposed algorithm, we integrate the widely used energy, zero crossing and entropy to form a new feature, EZE-feature, which possesses advantages while compensating the drawbacks of each individual. Moreover, an adaptive endpoint detection method is proposed which makes the EZE-feature modify its environment parameters by adapting to the strength of background noise. The proposed algorithm has been used in an automatic video caption generation system, and the performance of the algorithm is very well.

Key words:endpoint detection; caption; video caption generation; audio analysis; speech recognition

Reference

[1] Evangelopoulos G, Maragos P. Multiband modulation energy tracking for noisy speech detection. IEEE Trans. on Audio, Speech and Language Process, 2006,14(6):2024-2038.

[2] Junqua JC, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise. IEEE Trans. on Speech and Audio Process, 2004,2(3):406?412.

[3] Koichi Y, Firas J, Klaus R, Akinori K. Robust endpoint detection for speech recognition based on discriminative feature extraction. In: Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2006. 805?808.

[4] Li Q, Zheng J, Tsai A, Zhou Q. Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. on Speech Audio Process, 2002,10(3):146-157.

[5] Li Q, Zheng J, Zhou Q, Lee CH. A robust, real-time endpoint detector with energy normalization for ASR in adverse environments. In: Proc. of the IEEE Int'l Conf. Acoust. Speech, Signal Process. 2001. 233-236.

[6] Wu BF, Wang KC. Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Trans. on Speech Audio Process, 2005,13(5):762-775.

[7] Yamamoto K, Jabloun F. Robust endpoint detection for speech recognition based on discriminative feature. In: Proc. of the IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. 2006. 114-119.

[8] Zhang X, Li G, Qiao F. A speech endpoint detection algorithm based on entropy and RBF neural network. In: Proc. of the IEEE Int'l Conf. on Granular Computing. 2007. 506-509.

[9] Liu HP, Li X, Zheng Y, Xu BL, Jiang N. Speech endpoint detection based on improved adaptive band-partitioning spectral entropy. Journal of System Simulation, 2008,20(5):1366?1371 (in Chinese with abstract English).

[10] Yan BF, Zhu XY, Zhang ZJ, Zhang F. Robust speech recognition based on neighborhood space. Journal of Software, 2007,18(4):878?883 (in Chinese with abstract English). http://www.jos.org.cn/1000-9825/18/878.htm

[11] Tang Y, Liu WJ, Xu B. Mandarin digit string recognition based on segment model using posterior probability decoding. Chinese Journal of Computers, 2006,29(4):635?641 (in Chinese with abstract English). 附中文参考文献:

[9] 刘华平,李昕,郑宇,徐柏龄,姜宁.一种改进的自适应子带谱熵语音端点检测方法.系统仿真学报,2008,20(5):1366?1371.

[10] 严斌峰,朱小燕.基于邻接空间的鲁棒语音识别方法.软件学报,2007,18(4):878?883. http://www.jos.org.cn/1000-9825/18/878.htm

[11] 唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别.计算机学报,2006,29(4):635-641.

Get Citation

李祺,马华东,冯硕.用于自动字幕生成系统的语音端点检测算法.软件学报,2008,19(zk):96-103

Copy

Article Metrics

Abstract:4693
PDF: 7254
HTML: 0
Cited by: 0

History

Received:May 01,2008
Revised:November 25,2008
Adopted:
Online:
Published:

You are the first2034815Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History