Given a video containing a multi-person scene, group activity recognition model needs to recognize the group activity that multiple people in video are completing together. Group activity recognition is an important problem in video understanding and can be applied to sports videos analysis, surveillance video recognition, social behavior understanding, and other real scenarios. Multi-person scene video is complicated, and the spatial-temporal information is rich, which requires the model to extract key information. To accurately recognize group activity, the model should efficiently model the hierarchical relationships in the scene and extract distinguishing spatial-temporal features for people. Due to its wide range of application requirements, the problem of group activity recognition has received extensive attention from researchers. This study has conducted an in-depth analysis of a large number of research work on group activity recognition in recent years, and summarized the main challenges of group activity recognition research, systematically summarized six types of group activity recognition methods, including traditional non-deep learning recognition methods and recognition methods based on deep learning technology, and proposed the possible directions of future research.