University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Audio Head Pose Estimation using the Direct to Reverberant Speech Ratio

Barnard, M and Wang, W (2016) Audio Head Pose Estimation using the Direct to Reverberant Speech Ratio Speech Communication.

[img] Text
BarnardW_SpeechComm_2016_postprint.pdf - Accepted version Manuscript
Restricted to Repository staff only until 28 March 2018.
Available under License : See the attached licence file.

Download (671kB)
Text (licence)
Available under License : See the attached licence file.

Download (33kB) | Preview


Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have focussed on the use of visual information of a subject’s head. These visual approaches have a number of limitations such as, an inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a reverberant room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This method has the advantage of actually exploiting the reverberations within a room rather than trying to suppress them. This also has the practical advantage that most enclosed living spaces, such as meeting rooms or offices are highly reverberant environments. In order to test this hypothesis we also present a new data set featuring 56 subjects recorded in three different rooms, with different acoustic properties, adopting 8 different head poses in 4 different room positions captured with a 16 element microphone array. As far as the authors are aware this data set is unique and will make a significant contribution to further work in the area of audio head pose estimation. Using this data set we demonstrate that our proposed method of using the DRR for audio head pose estimation provides a significant improvement over previous methods.

Item Type: Article
Subjects : Electronic Engineering
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
Date : 28 September 2016
Funders : EPSRC
Identification Number : 10.1016/j.specom.2016.09.005
Copyright Disclaimer : © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Related URLs :
Depositing User : Symplectic Elements
Date Deposited : 27 Sep 2016 09:22
Last Modified : 31 Oct 2017 18:45

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800