Abstract:The quadratic complexity required for measuring the similarity of news stories makes it intractable in large-volume news videos. In this paper, an effective method is proposed to find a way to solve the problems. First, small partitions from the corpus and prune local keypoint are selected to accelerate matching speed. Then, a hierarchical approach for identifying near duplicate keyframes is proposed. Furthermore, this paper presents a method to identity correlation of stories based on near duplicate keyframes and transitivity of correlations. Finally, a method for calculating the similarity of news stories is presented based on near duplicate keyframes. Experimental results show that this approach greatly speeds up the matching speed and improves the matching accuracy. The similarity of stories is closer to users sensory.