Emotion in Speech

Speech Emotion Recognition

Sponsor: IRCSET (Embark Initiative)

It is well known that much of the information conveyed in speech is non-verbal. In the past however, speech recognition has focused almost exclusively on the words that were spoken, while disregarding the emotional content. Automatic recognition of emotion from speech has many potential applications, from the design of more user friendly human-machine interfaces to the improvement of speech recognition for natural speech. As this is a relatively young field there remains uncertainty in the literature over the best classifier architectures and feature sets for emotion classification, and even over how emotion should be represented within the clasifier framework.

Early work in emotion recognition focused on the recognition of discrete emotional states (anger, joy, fear, etc.). However, there is a growing belief that it may be more useful and powerful to classify speech along affective dimensions (activation, valence, etc.). By associating discrete emotions with regions within this activation-valence space we should be able to more effeciently represent a wider range of emotion.

To date, we have used Hidden Markov Models classifiers to explore the classification of speech along four affective dimensions (activation, expectation, power and valence), and have compared classification performance using different feature sets. Work is currently ongoing to extend this study in order to gain a more indepth insight into how emotion is captured by different spectral, prosodic, and spectro-temporal features. In the future we intend to explore alternative classifiers (for example Gaussian Mixture Models or Support Vector Machines), and to eventually move toward continuous recognition of emotion.

Page last modified on May 16, 2012