Abstract:This paper proposes a new approach for the text recognition of video, whose novelty mainly lies in the color-based clustering and multiple frame integration of three phases: First, in the text detection phase, the two significant features of text block are jointly considered in a video: homogeneous color, dense edges, and color-based clustering are employed to decompose the color edge map of video frame into several edge maps, which make the text detection more accurate. Second, in text enhancement phase, the text blocks are identified and integrated with the same content by filtering the blurred text based on the proposed text-intensity map, which can obtain the clean background and clear text with a high contrast of effective text extraction. Third, in the text extraction phase, on one hand, for effective binarization of text block, instead of performing binarization in a constant color plane as in the existing methods, this approach can adaptively select the best color plane according to the text contrast difference among color planes for binarization. On the other hand, for effective text recognition, the color differences between the text and background in video frames are considred, and color-based clustering is utilized to remove the noises. Extensive experimental results have shown that this approach outperforms several existing state-of-the- art methods.