Abstract:With the development of multimedia technology, the use of video has increased in many fields, and captions are frequently inserted into video images to aid the understanding of audience. This paper proposes a robust endpoint detection algorithm for continuous speech in noisy environment, and it can be used in automatic video caption generation systems. In the proposed algorithm, we integrate the widely used energy, zero crossing and entropy to form a new feature, EZE-feature, which possesses advantages while compensating the drawbacks of each individual. Moreover, an adaptive endpoint detection method is proposed which makes the EZE-feature modify its environment parameters by adapting to the strength of background noise. The proposed algorithm has been used in an automatic video caption generation system, and the performance of the algorithm is very well.