Abstract:Speech emotion recognition is an important part of affective computing and plays an important role in human-computer interaction. Accurately distinguishing emotions helps machines understand users’ intentions and provide better interactivity to enhance user experience. This study reviews the theories and methods of speech emotion recognition focusing on discrete speech emotions. Firstly, the study reviews the development of emotion recognition and presents an architecture of speech emotion recognition to summarize research progress. Secondly, emotion representation models and commonly used corpora are introduced to provide basic support for speech emotion recognition. Then, the process of speech emotion recognition is outlined, including feature extraction and recognition models, with a focus on traditional classification models, classical deep models, and other advanced models. Meanwhile, commonly used evaluation indicators are introduced and applied to provide a summary of models. Finally, the study discusses the challenges in speech emotion recognition and suggests possible directions for future research.