Abstract:The rise of video platforms has led to the rapid dissemination of videos, integrating them into various aspects of social life. Videos transmitted in the network may include harmful content, highlighting an urgent need for cyberspace security supervision to accurately identify harmful videos that are encrypted and transmitted in the network. The existing methods collect traffic data at main network access points to extract the features of encrypted video traffic and identify the harmful videos by matching the traffic features based on harmful video databases. However, with the progress of encryption protocol for video transmission, HTTP/2 using new multiplexing technologies has been widely applied, which makes the traditional traffic analysis method based on HTTP/1.1 features fail to identify encrypted videos using HTTP/2. Moreover, the current research mostly focuses on videos with a fixed resolution during playback. Few studies have considered the impact of resolution switching in video identification. To address the above problems, this study analyzes the factors that cause offsets in the length of the audio/video data during the HTTP/2 transmission process and proposes a method to precisely reconstruct corrected fingerprints for encrypted videos by calculating the size of the combined audio and video segments in the encrypted traffic. The study also proposes an encrypted video identification model based on the hidden Markov model and the Viterbi algorithm by using the corrected fingerprints of encrypted videos and a large plaintext fingerprint database for videos. The model applies dynamic planning to solve the problems caused by adaptive video resolution switching. The proposed model achieves identification accuracy of 98.41% and 97.91% respectively for encrypted videos with fixed and adaptive resolutions in 400000-level fingerprint databases, namely Facebook and Instagram. The study validates the generality and generalization of the proposed method using three video platforms: Triller, Twitter, and Mango TV. The higher application value of the proposed method has been validated through comparisons with similar work in terms of recognition effectiveness, generalization, and time overhead.