Abstract:With the rapid development of the Internet and big data, the scale and variety of data are increasing. Video, as an important form of information, is becoming increasingly prevalent, particularly with the recent growth of short videos. Understanding and analyzing large-scale videos has become a hot topic of research. Entity linking, as a way of enriching background knowledge, can provide a wealth of external information. Entity linking in videos can effectively assist in understanding the content of video, enabling classification, retrieval, and recommendation of video content. However, the granularity of existing video linking datasets and methods is too coarse. Therefore, this study proposes a video-based fine-grained entity linking approach, focusing on live streaming scenarios, and constructs a fine-grained video entity linking dataset. Additionally, based on the challenges of fine-grained video linking tasks, this study proposes the use of large models to extract entities and their attributes from videos, as well as utilizing contrastive learning to obtain better representations of videos and their corresponding entities. The results demonstrate that the proposed method can effectively handle fine-grained entity linking tasks in videos.