University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Speaker-dependent audio-visual emotion recognition

Haq, S and Jackson, PJB (2009) Speaker-dependent audio-visual emotion recognition

[img] Text
HaqJackson_AVSP09.pdf
Restricted to Repository staff only
Available under License : See the attached licence file.

Download (2MB)
[img] Text (licence)
licence.txt
Restricted to Repository staff only

Download (1kB)

Abstract

This paper explores the recognition of expressed emotion from speech and facial gestures for the speaker-dependent case. Experiments were performed on an English audio-visual emotional database consisting of 480 utterances from 4 English male actors in 7 emotions. A total of 106 audio and 240 visual features were extracted and features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. Linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification. The performance was higher for LDA features compared to PCA features. The visual features performed better than the audio features and overall performance improved for the audio-visual features. In case of 7 emotion classes, an average recognition rate of 56% was achieved with the audio features, 95% with the visual features and 98% with the audio-visual features selected by Bhattacharyya distance and transformed by LDA. Grouping emotions into 4 classes, an average recognition rate of 69% was achieved with the audio features, 98% with the visual features and 98% with the audio-visual features fused at decision level. The results were comparable to the measured human recognition rate with this multimodal data set.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
NameEmailORCID
Haq, SUNSPECIFIEDUNSPECIFIED
Jackson, PJBUNSPECIFIEDUNSPECIFIED
Date : September 2009
Uncontrolled Keywords : audio-visual emotion, data evaluation, linear transformation, speaker-dependent
Depositing User : Symplectic Elements
Date Deposited : 28 Mar 2017 14:58
Last Modified : 31 Oct 2017 14:12
URI: http://epubs.surrey.ac.uk/id/eprint/7731

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800