Abstract:Speech emotion recognition is an important research area in human computer interaction (HCI). The speech emotion recognition system used in the intervention therapy for autistic children is helpful for their rehabilitation. However, the variation and complexity in speech emotion features, the extraction of which itself is a challenging task, will contribute to the difficulty to improve the recognition performance of the whole system. In view of this problem, this paper proposes a new method of speech emotion feature extraction with unsupervised auto-encoding network to learn emotional feature in speech signal automatically. By constructing a 3-layer auto-encoding network to extract the speech emotional feature, the high level feature is used as the input of extreme learning machine classifier to make final recognition. The speech emotion recognition rate of the system reaches 84.14%, which is higher than the traditional method based on human defined feature extraction.