Dimensional Speech Emotion Recognition Review
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (2018YFC0806800); National Natural Science Foundation of China (61671187); Shenzhen Foundational Research Fund (JCYJ20180507183608379); Key Laboratory Project of Innovation Environment Construction Plan of Shenzhen Municipality (ZDSYS201707311437102); Open Fund of MOE-Microsoft Key Laboratory of Natural Language Processing and Speech (HIT.KLOF.20160xx); Applied Basic Research Programs(CJN13J004); Basic Research and Application Programs Foundation of Guangdong Province (2019A1515111179)

  • Article
  • | |
  • Metrics
  • |
  • Reference [158]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Emotion recognition is an interdisciplinary research field which relates to cognitive science, psychology, signal processing, pattern recognition, artificial intelligence, and so on, aiming at helping computer understand human emotion state to realize natural human-computer interaction. In this survey, the psychological theory of emotion is firstly introduced as the theoretical basis for the emotion model used in emotion recognition, including appraisal theory, dimensional models of emotion, brain mechanisms, and computing models. Then, the advanced technologies of dimensional emotion recognition from the artificial intelligence perspective, such as the speech emotion corpora, feature extraction, classification, are presented in detail. Finally, the challenges of dimensional emotion recognition are discussed and the workable solutions and future research directions are proposed.

    Reference
    [1] Crystal, D. Non-segmental phonology in language acquisition:A review of the issues. Lingua, 1973,32(1-2):1-45.
    [2] Liebenthal E, Silbersweig DA, Stern E. The language, tone and prosody of emotions:Neural substrates and dynamics of spoken-word emotion perception. Frontiers in Neuroscience, 2016,10:No.506.
    [3] Murray IR, Arnott JL. Toward the simulation of emotion in synthetic speech:A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 1993,93(2):1097-1108.
    [4] Williams CE, Stevens KN. Emotions and speech:Some acoustical correlates. The Journal of the Acoustical Society of America, 1972,52(4B):1238-1250.
    [5] Murray IR, Arnott JL. Synthesizing emotions in speech:Is it time to get excited? In:Proc. of the 4th Int'l Conf. on Spoken Language Processing (ICSLP'96). IEEE, 1996.
    [6] Valstar M, Gratch J, Schuller B, et al. Depression, mood, and emotion recognition workshop and challenge. In:Proc. of the 6th Int'l Workshop on Audio/Visual Emotion Challenge (AVEC 2016). ACM, 2016. 3-10.
    [7] Dhall A, Kaur A, Goecke R, Gedeon T. Emotiw 2018:Audio-video, student engagement and group-level affect prediction. In:Proc. of the 2018 on Int'l Conf. on Multimodal Interaction. ACM, 2018.
    [8] Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J. Mec 2016:The multimodal emotion recognition challenge of CCPR 2016. In:Proc. of the Chinese Conf. on Pattern Recognition. Springer-Verlag, 2016.
    [9] Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J. Mec 2017:Multimodal emotion recognition challenge. In:Proc. of the 20181st Asian Conf. on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 2018.
    [10] Christianson SA. The Handbook of Emotion and Memory:Research and Theory. Psychology Press, 2014.
    [11] Lewis M, Haviland-Jones JM, Barrett LF. Handbook of Emotion. 3rd ed., The Guilford Press, 2008. 249-271.
    [12] Ortony A, Turner TJ. What's basic about basic emotions? Psychological Review, 1990,97(3):315-331.
    [13] Gunes H, Schuller B, Pantic M, Cowie R. Emotion representation, analysis and synthesis in continuous space:A survey. In:Proc. of the 2011 IEEE Int'l Conf. on Automatic Face & Gesture Recognition and Workshops (FG 2011). IEEE, 2011.
    [14] Chen S, Jin Q. Multi-modal dimensional emotion recognition using recurrent neural networks. In:Proc. of the 5th Int'l Workshop on Audio/Visual Emotion Challenge. ACM, 2015.
    [15] Ringeval F, Eyben F, Kroupi E, Yuce A, Thiran JP, Ebrahimi T, Lalanne D, Schuller B. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognition Letters, 2015,66:22-30.
    [16] Fontaine J. The dimensional, basic, and componential emotion approaches to meaning in psychological emotion research. In:Proc. of the Components of Emotional Meaning:A Sourcebook. Oxford University Press, 2013. 31-45.
    [17] Cowie R, Cornelius RR. Describing the emotional states that are expressed in speech. Speech Communication, 2003,40(1-2):5-32.
    [18] Scherer KR. On the nature and function of emotion:A component process approach. In:Approaches to Emotion. Psychology Press, 1984.
    [19] Scherer KR. Vocal communication of emotion:A review of research paradigms. Speech Communication, 2003,40(1-2):227-256.
    [20] Ortony A, Clore GL, Collins A. The Cognitive Structure of Emotions. Cambridge University Press, 1990.
    [21] Roseman IJ. Appraisal determinants of emotions:Constructing a more accurate and comprehensive theory. Cognition & Emotion, 1996,10(3):241-278.
    [22] Soleimani A, Kobti Z. Toward a fuzzy approach for emotion generation dynamics based on occ emotion model. IAENG Int'l Journal of Computer Science, 2014,41(1):48-61.
    [23] Olgun ZN, Chae Y, Kim C. A system to generate robot emotional reaction for robot-human communication. In:Proc. of the 201815th Int'l Conf. on Ubiquitous Robots (UR). IEEE, 2018.
    [24] Masuyama N, Loo CK, Seera M. Personality affected robotic emotional model with associative memory for human-robot interaction. Neurocomputing, 2018,272:213-225.
    [25] Cavallo F, Semeraro F, Fiorini L, Magyar G, Sinčák P, Dario P. Emotion modelling for social robotics applications:A review. Journal of Bionic Engineering, 2018,15(2):185-203.
    [26] Rincon JA, Costa A, Novais P, Julian V, Carrascosa C. A new emotional robot assistant that facilitates human interaction and persuasion. In:Proc. of the Knowledge and Information Systems. 2018. 1-21.
    [27] Bartneck C, Lyons MJ, Saerbeck M. The relationship between emotion models and artificial intelligence. arXiv preprint arXiv:1706.09554, 2017.
    [28] Wundt W. Vorlesungen über die Menschen-und Thierseele. The Monist, 1863.
    [29] Schlosberg H. Three dimensions of emotion. Psychological Review, 1954,61(2):81-88.
    [30] Russell JA, Mehrabian A. Evidence for a three-factor theory of emotions. Journal of Research in Personality, 1977,11(3):273-294.
    [31] Plutchik R. Emotions:A general psychoevolutionary theory. In:Approaches to Emotion. Psychology Press, 1984.
    [32] Russell JA. A circumplex model of affect. Journal of Personality and Social Psychology, 1980,39(6):1161-1178.
    [33] Krech D, Crutchfield RS, Livson N. Elements of Psychology. Alfred A. Knopf, 1974.
    [34] Izard CE. The Psychology of Emotions. Springer Science & Business Media, 1991.
    [35] Mehrabian A. Analysis of the big-five personality factors in terms of the PAD temperament model. Australian Journal of Psychology, 1996,48(2):86-92.
    [36] Gebhard P. Alma:A layered model of affect. In:Proc. of the 4th Int'l Joint Conf. on Autonomous Agents and Multiagent Systems. ACM, 2005.
    [37] Becker-Asano C. Wasabi:Affect Simulation for Agents with Believable Interactivity. IOS Press, 2008.
    [38] Han WJ, Li HF, Ruan HB, Ma L. Review on speech emotion recognition. Ruan Jian Xue Bao/Journal of Software, 2014,25(1):37-50(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4497.htm[doi:10.13328/j.cnki.jos.004497]
    [39] MacLean PD. Psychosomatic disease and the "visceral brain"; Recent developments bearing on the Papez theory of emotion. Psychosomatic Medicine, 1949,11(6):338-353.
    [40] Arnold MB. Emotion and Personality. Columbia University Press, 1960.
    [41] Pribram KH. Feelings as monitors. In:Loyola Symp. on Feelings and Emotions. New York:Academic Press, 1970.
    [42] Lazarus RS, Folkman S. Stress, Appraisal, and Coping. Springer Publishing Company, 1984.
    [43] LeDoux J, Bemporad JR. The emotional brain. Journal of the American Academy of Psychoanalysis, 1997,25(3):525-528.
    [44] Morén J. Emotion and Learning-A Computational Model of the Amygdala[Ph.D. Thesis]. Lunds Universitet, 2002.
    [45] Morén J, Balkenius C. A computational model of emotional learning in the amygdala. From Animals to Animats, 2000,6:115-124.
    [46] Phelps EA, LeDoux JE. Contributions of the amygdala to emotion processing:From animal models to human behavior. Neuron, 2005,48(2):175-187.
    [47] Mathersul D, Williams LM, Hopkinson PJ, Kemp AH. Investigating models of affect:Relationships among EEG alpha asymmetry, depression, and anxiety. Emotion, 2008,8(4):560-572.
    [48] Chikazoe J, Lee DH, Kriegeskorte N, Anderson AK. Population coding of affect across stimuli, modalities and individuals. Nature Neuroscience, 2014,17(8):1114-1122.
    [49] Kirkby LA, Luongo FJ, Lee MB, Nahum M, Van Vleet TM, Rao VR, Dawes HE, Chang EF, Sohal VS. An amygdala-hippocampus subnetwork that encodes variation in human mood. Cell, 2018,175(6):1688-1700.
    [50] Lindquist KA, Wager TD, Kober H, Bliss-Moreau E, Barrett LF. The brain basis of emotion:A meta-analytic review. Behavioral and Brain Sciences, 2012,35(3):121-143.
    [51] Buchanan TW, Lutz K, Mirzazade S, Specht K, Shah NJ, Zilles K, Jäncke L. Recognition of emotional prosody and verbal components of spoken language:An fmri study. Cognitive Brain Research, 2000,9(3):227-238.
    [52] George MS, Parekh PI, Rosinsky N, Ketter TA, Kimbrell TA, Heilman KM, Herscovitch P, Post RM. Understanding emotional prosody activates right hemisphere regions. Archives of Neurology, 1996,53(7):665-670.
    [53] Paulmann S, Kotz SA. Temporal interaction of emotional prosody and emotional semantics:Evidence from ERPs. In:Proc. of the Int'l Conf. on Speech Prosody. 2006. 89-92.
    [54] Pihan H, Altenmüller E, Ackermann H. The cortical processing of perceived emotion:A DC-potential study on affective speech prosody. Neuroreport, 1997,8(3):623-627.
    [55] Ross ED, Thompson RD, Yenkosky J. Lateralization of affective prosody in brain and the callosal integration of hemispheric language functions. Brain and Language, 1997,56(1):27-54.
    [56] Ross ED, Edmondson JA, Seibert GB, Homan RW. Acoustic analysis of affective prosody during right-sided Wada test:A within-subjects verification of the right hemisphere's role in language. Brain and Language, 1988,33(1):128-145.
    [57] Davidson RJ, Abercrombie H, Nitschke JB, Putnam K. Regional brain function, emotion and disorders of emotion. Current Opinion in Neurobiology, 1999,9(2):228-234.
    [58] Zatorre RJ, Belin P, Penhune VB. Structure and function of auditory cortex:Music and speech. Trends in Cognitive Sciences, 2002,6(1):37-46.
    [59] Ethofer T, Van De Ville D, Scherer K, Vuilleumier P. Decoding of emotional information in voice-sensitive cortices. Current Biology, 2009,19(12):1028-1033.
    [60] Wildgruber D, Riecker A, Hertrich I, Erb M, Grodd W, Ethofer T, Ackermann H. Identification of emotional intonation evaluated by fMRI. Neuroimage, 2005,24(4):1233-1241.
    [61] Wildgruber D, Ackermann H, Kreifelts B, Ethofer T. Cerebral processing of linguistic and emotional prosody:Fmri studies. Progress in Brain Research, 2006,156:249-268.
    [62] Grandjean D, Sander D, Pourtois G, Schwartz S, Seghier ML, Scherer KR, Vuilleumier P. The voices of wrath:Brain responses to angry prosody in meaningless speech. Nature Neuroscience, 2005,8(2):145-146.
    [63] Kotz SA, Kalberlah C, Bahlmann J, Friederici AD, Haynes JD. Predicting vocal emotion expressions from the human brain. Human Brain Mapping, 2013,34(8):1971-1981.
    [64] Fritsch N, Kuchinke L. Acquired affective associations induce emotion effects in word recognition:An ERP study. Brain and Language, 2013,124(1):75-83.
    [65] Elliot C. The affective reasoner:A process model of emotions in a multi-agent system[Ph.D. Thesis]. Northwestern University, 1992.
    [66] Reilly WS. Believable social and emotional agents[Ph.D. Thesis]. Carnegie-Mellon University, 1996.
    [67] Gratch J, Marsella S. Evaluating the modeling and use of emotion in virtual humans. In:Proc. of the 3rd Int'l Joint Conf. on Autonomous Agents and Multiagent Systems, Vol.1. IEEE Computer Society, 2004. 320-327.
    [68] Velásquez JD, Maes P. Cathexis:A computational model of emotions. In:Proc. of the 1st Int'l Conf. on Autonomous Agents. ACM, 1997. 93-98.
    [69] Marsella S, Gratch J, Petta P. Computational models of emotion. A Blueprint for Affective Computing-A Sourcebook and Manual, 2010,11(1):21-46.
    [70] Watts L. Reverse-engineering the human auditory pathway. In:Proc. of the Advances in Computational Intelligence. Springer-Verlag, 2012. 47-59.
    [71] Abdi J, Moshiri B, Abdulhai B, Sedigh AK. Forecasting of short-term traffic-flow based on improved neurofuzzy models via emotional temporal difference learning algorithm. Engineering Applications of Artificial Intelligence, 2012,25(5):1022-1042.
    [72] Falahiazar A, Setayeshi S, Sharafi Y. Computational model of social intelligence based on emotional learning in the amygdala. Journal of Mathematics and Computer Science, 2015,14:77-86.
    [73] Milad HS, Farooq U, El-Hawary ME, Asad MU. Neo-fuzzy integrated adaptive decayed brain emotional learning network for online time series prediction. IEEE Access, 2017,5:1037-1049.
    [74] Lotfi E, Khazaei O, Khazaei F. Competitive brain emotional learning. Neural Processing Letters, 2018,47(2):745-764.
    [75] Lucas C, Shahmirzadi D, Sheikholeslami N. Introducing BELBIC:Brain emotional learning based intelligent controller. Intelligent Automation & Soft Computing, 2004,10(1):11-21.
    [76] Parsapoor M, Bilstrup U. Brain emotional learning based fuzzy inference system (BELFIS) for solar activity forecasting. In:Proc. of the IEEE 24th Int'l Conf. on Tools with Artificial Intelligence. IEEE, 2012. 532-539.
    [77] Motamed S, Setayeshi S, Rabiee A. Speech emotion recognition based on a modified brain emotional learning model. Biologically Inspired Cognitive Architectures, 2017,19:32-38.
    [78] Grimm M, Kroschel K, Mower E, Narayanan S. Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 2007,49(10-11):787-800.
    [79] Hammerschmidt K, Jürgens U. Acoustical correlates of affective prosody. Journal of Voice, 2007,21(5):531-540.
    [80] Laukka P, Elfenbein HA, Söder N, Nordström H, Althoff J, Iraki FKE, Rockstuhl T, Thingujam NS. Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations. Frontiers in Psychology, 2013,4:No.353.
    [81] Sauter DA, Eisner F, Ekman P, et al. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proc. of the National Academy of Sciences, 2010,107(6):2408-2412.
    [82] Tickle A. English and Japanese speakers' emotion vocalisation and recognition:A comparison highlighting vowel quality. In:Proc. of the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 2000.
    [83] Yang LC, Campbell N. Linking form to meaning:The expression and recognition of emotions through prosody. In:Proc. of the 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis. 2001.
    [84] Thompson WF, Balkwill LL. Decoding speech prosody in five languages. Semiotica, 2006,2006(158):407-424.
    [85] Pell MD, Monetta L, Paulmann S, Kotz SA. Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 2009, 33(2):107-120.
    [86] Bryant G, Barrett HC. Vocal emotion recognition across disparate cultures. Journal of Cognition and Culture, 2008,8(1-2):135-148.
    [87] Émond C, Ménard L, Laforest M, Bimbot F, Cerisara C, Fougeron C, Gravier G, Lamel L. Perceived prosodic correlates of smiled speech in spontaneous data. In:Proc. of the Interspeech. 2013.
    [88] Wang YT, Han J, Jiang XQ, Zou J, Zhao H. Study of speech emotion recognition based on prosodic parameters and facial expression features. In:Proc. of the Applied Mechanics and Materials. 2013.
    [89] Rao KS, Koolagudi SG, Vempada RR. Emotion recognition from speech using global and local prosodic features. Int'l Journal of Speech Technology, 2013,16(2):143-160.
    [90] Pao TL, Chen YT, Yeh JH, Liao WY. Detecting emotions in mandarin speech. Int'l Journal of Computational Linguistics & Chinese Language Processing, 2005,10(3):347-362.
    [91] Pereira C. Dimensions of emotional meaning in speech. In:Proc. of the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 2000.
    [92] Borchert M, Dusterhoft A. Emotions in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In:Proc. of the 2005 Int'l Conf. on Natural Language Processing and Knowledge Engineering. IEEE, 2005.
    [93] Arias JP, Busso C, Yoma NB. Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Computer Speech & Language, 2014,28(1):278-294.
    [94] Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 2001,18(1):32-80.
    [95] Sant'Ana R, Coelho R, Alcaim A. Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model. IEEE Trans. on Audio, Speech, and Language Processing, 2006,14(3):931-940.
    [96] Zao L, Cavalcante D, Coelho R. Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 2014,21(5):620-624.
    [97] Mencattini A, Martinelli E, Costantini G, Todisco M, Basile B, Bozzali M, Di Natale C. Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure. Knowledge-based Systems, 2014,63:68-81.
    [98] Tato R, Santos R, Kompe R, Pardo JM. Emotional space improves emotion recognition. In:Proc. of the 7th Int'l Conf. on Spoken Language Processing. 2002.
    [99] Idris I, Salam MSH. Emotion detection with hybrid voice quality and prosodic features using neural network. In:Proc. of the 20144th World Congress on Information and Communication Technologies (WICT 2014). IEEE, 2014.
    [100] Kächele M, Zharkov D, Meudt S, Schwenker F. Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In:Proc. of the 201422nd Int'l Conf. on Pattern Recognition. IEEE, 2014.
    [101] Huang Y, Zhang G, Li Y, Wu A. Improved emotion recognition with novel task-oriented wavelet packet features. In:Proc. of the Int'l Conf. on Intelligent Computing. Springer-Verlag, 2014.
    [102] Ziółko M, Jaciów P, Igras M. Combination of Fourier and wavelet transformations for detection of speech emotions. In:Proc. of the 20147th Int'l Conf. on Human System Interactions (HSI). IEEE, 2014.
    [103] Idris I, Salam MS. Improved speech emotion classification from spectral coefficient optimization. In:Proc. of the Advances in Machine Learning and Signal Processing. Springer-Verlag, 2016. 247-257.
    [104] Espinosa HP, García CAR, Pineda LV. Features selection for primitives estimation on emotional speech. In:Proc. of the 2010 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2010.
    [105] Wang K, An N, Li BN, Zhang Y, Li L. Speech emotion recognition using fourier parameters. IEEE Trans. on Affective Computing, 2015,6(1):69-75.
    [106] Ghosh S, Laksana E, Morency LP, Scherer S. Representation learning for speech emotion recognition. In:Proc. of the Interspeech. 2016.
    [107] Schuller B, Rigoll G. Recognising interest in conversational speech-comparing bag of frames and supra-segmental features. In:Proc. of the Interspeech 2009. Brighton, 2009.
    [108] El Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition:Features, classification schemes, and databases. Pattern Recognition, 2011,44(3):572-587.
    [109] Origlia A, Cutugno F, Galatà V. Continuous emotion recognition with phonetic syllables. Speech Communication, 2014,57:155-169.
    [110] Sethu V, Ambikairajah E, Epps J. On the use of speech parameter contours for emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2013,2013(1):No.19.
    [111] Han WJ, Li HF, Han JQ. Speech emotion recognition with combined short and long term features. Journal of Tsinghua University (Science and Technology), 2008,48(1):708-714(in Chinese with English abstract).
    [112] Chen J, Li HF, Ma L, Chen X, Chen XM. Multi-granularity feature fusion for dimensional speech emotion recognition. Journal of Signal Processing, 2017,33(3):374-382(in Chinese with English abstract).
    [113] Deng J, Cummins N, Han J, Xu X, Ren Z, Pandit V, Zhang Z, Schuller B. The university of Passau open emotion recognition system for the multimodal emotion challenge. In:Proc. of the Chinese Conf. on Pattern Recognition. Springer-Verlag, 2016.
    [114] Lee CM, Narayanan SS. Toward detecting emotions in spoken dialogs. IEEE Trans. on Speech and Audio Processing, 2005,13(2):293-303.
    [115] Schuller B, Rigoll G, Lang M. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In:Proc. of the 2004 IEEE Int'l Conf. on Acoustics, Speech, and Signal Processing. IEEE, 2004.
    [116] Wu CH, Liang WB. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. on Affective Computing, 2011,2(1):10-21.
    [117] Wold S, Sjöström M, Eriksson L. PLS-regression:A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 2001,58(2):109-130.
    [118] Vinzi VE, Trinchera L, Amato S. PLS Path Modeling:From Foundations to Recent Developments and Open Issues for Model Assessment and Improvement. Handbook of Partial Least Squares. Springer-Verlag, 2010. 47-82.
    [119] Vapnik V. The Nature of Statistical Learning Theory. Springer Science & Business Media, 2013.
    [120] Campbell C. An introduction to kernel methods. Studies in Fuzziness and Soft Computing, 2001,66:155-192.
    [121] Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and Computing, 2004,14(3):199-222.
    [122] Grimm M, Kroschel K, Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech. In:Proc. of the 2007 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP 2007). IEEE, 2007.
    [123] Giannakopoulos T, Pikrakis A, Theodoridis S. A dimensional approach to emotion recognition of speech from movies. In:Proc. of the 2009 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. IEEE, 2009.
    [124] Kanluan I, Grimm M, Kroschel K. Audio-visual emotion recognition using an emotion space concept. In:Proc. of the 200816th European Signal Processing Conf. IEEE, 2008.
    [125] Wöllmer M, Kaiser M, Eyben F, Schuller B, Rigoll G. LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing, 2013,31(2):153-163.
    [126] Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M. AVEC 2011-The 1st Int'l audio/visual emotion challenge. In:Proc. of the Int'l Conf. on Affective Computing and Intelligent Interaction. Springer-Verlag, 2011.
    [127] Chao L, Tao J, Yang M, Li Y, Wen Z. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In:Proc. of the 5th Int'l Workshop on Audio/Visual Emotion Challenge. ACM, 2015.
    [128] Chen YL, Cheng YF, Chen XQ, Wang HX, Li C. Speech emotion estimation in PAD 3D emotion space. Journal of Harbin Institute of Technology, 2018,50(11):160-166(in Chinese with English abstract).
    [129] Han WJ, Li HF, Ma L. Considering relative order of emotional degree in dimensional speech emotion recognition. Signal Processing, 2011,27(11):1658-1663(in Chinese with English abstract).
    [130] Tanaka A, Koizumi A, Imai H, Hiramatsu S, Hiramoto E, de Gelder B. I feel your voice:Cultural differences in the multisensory perception of emotion. Psychological Science, 2010,21(9):1259-1262.
    [131] Liu P, Rigoulot S, Pell MD. Culture modulates the brain response to human expressions of emotion:Electrophysiological evidence. Neuropsychologia, 2015,67:1-13.
    [132] Liu P, Rigoulot S, Pell MD. Cultural differences in on-line sensitivity to emotional voices:Comparing east and west. Frontiers in Human Neuroscience, 2015,9:No.311.
    [133] Elfenbein HA, Ambady N. On the universality and cultural specificity of emotion recognition:A meta-analysis. Psychological Bulletin, 2002,128(2):203-235.
    [134] Song P. Transfer linear subspace learning for cross-corpus speech emotion recognition. IEEE Annals of the History of Computing, 2019,(2):265-275.
    [135] Sagha H, Matejka P, Gavryukova M, Povolný F, Marchi E, Schuller BW. Enhancing multilingual recognition of emotion in speech by language identification. In:Proc. of the Interspeech. 2016.
    [136] Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing, 2018,275:1028-1034.
    [137] Feraru SM, Schuller D. Cross-language acoustic emotion recognition:An overview and some tendencies. In:Proc. of the 2015 Int'l Conf. on Affective Computing and Intelligent Interaction (ACII). IEEE, 2015.
    [138] Böck R, Siegert I, Haase M, Lange J, Wendemuth A. Ikannotate-A tool for labelling, transcription, and annotation of emotionally coloured speech. In:Proc. of the Int'l Conf. on Affective Computing and Intelligent Interaction. Springer, 2011.
    [139] Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M. ‘Feeltrace’:An instrument for recording perceived emotion in real time. In:Proc. of the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 2000. 19-24.
    [140] Zenk R, Franz M, Bubb H. Emocard-An approach to bring more emotion in the comfort concept. SAE Int'l Journal of Passenger Cars-mechanical Systems, 2008,1:775-782.
    [141] Bradley MM, Lang PJ. Measuring emotion:The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 1994,25(1):49-59.
    [142] Lang PJ. Int'l affective picture system (IAPS):Affective ratings of pictures and instruction manual. Technical Report, University of Florida, 2005.
    [143] Broekens J, Brinkman WP. AffectButton:A method for reliable and valid affective self-report. Int'l Journal of Human-computer Studies, 2013,71(6):641-667.
    [144] Ringeval F, Sonderegger A, Sauer J, Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In:Proc. of the 201310th IEEE Int'l Conf. and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 2013.
    [145] Siegert I, Wendemuth A. Ikannotate2-A tool supporting annotation of emotions in audio-visual data. Studientexte Zur Sprachkommunikation:Elektronische Sprach Signal Verarbeitung, 2017. 17-24.
    [146] Grimm M, Kroschel K, Narayanan S. The Vera ammittag german audio-visual emotional speech database. In:Proc. of the 2008 IEEE Int'l Conf. on Multimedia and Expo. IEEE, 2008.
    [147] McKeown G, Valstar MF, Cowie R, Pantic M. The SEMAINE corpus of emotionally coloured character interactions. In:Proc. of the 2010 IEEE Int'l Conf. on Multimedia and Expo. IEEE, 2010.
    [148] Schuller B, Valster M, Eyben F, Cowie R, Pantic M. AVEC 2012:The continuous audio/visual emotion challenge. In:Proc. of the 14th ACM Int'l Conf. on Multimodal Interaction. ACM, 2012.
    [149] Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS. IEMOCAP:Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008,42(4):335-359.
    [150] Ekman P, Friesen WV. Measuring facial movement. Environmental Psychology and Nonverbal Behavior, 1976,1(1):56-75.
    [151] Davidson RJ. Affective style, psychopathology, and resilience:Brain mechanisms and plasticity. American Psychologist, 2000, 55(11):1196-1214.
    [152] Banse R, Scherer KR. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 1996,70(3):614-636.
    附中文参考文献:
    [38] 韩文静,李海峰,阮华斌,马琳.语音情感识别研究进展综述.软件学报,2014,25(1):37-50. http://www.jos.org.cn/1000-9825/4497. htm[doi:10.13328/j.cnki.jos.004497]
    [111] 韩文静,李海峰,韩纪庆.基于长短时特征融合的语音情感识别方法.清华大学学报:自然科学版,2008,48(1):708-714.
    [112] 陈婧,李海峰,马琳,陈肖,陈晓敏.多粒度特征融合的维度语音情感识别方法.信号处理,2017,33(3):374-382.
    [128] 陈逸灵,程艳芬,陈先桥,王红霞,李超.PAD三维情感空间中的语音情感识别.哈尔滨工业大学学报,2018,50(11):160-166.
    [129] 韩文静,李海峰,马琳.考虑情感程度相对顺序的维度语音情感识别.信号处理,2011,27(11):1658-1663.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李海峰,陈婧,马琳,薄洪健,徐聪,李洪伟.维度语音情感识别研究综述.软件学报,2020,31(8):2465-2491

Copy
Share
Article Metrics
  • Abstract:3660
  • PDF: 12048
  • HTML: 6060
  • Cited by: 0
History
  • Received:June 30,2019
  • Revised:September 24,2019
  • Online: May 26,2020
  • Published: August 06,2020
You are the first2044097Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063