Audio head pose estimation using the direct to reverberant speech ratio
Barnard, M, Wang, W and Kittler, J (2013) Audio head pose estimation using the direct to reverberant speech ratio
![]() |
Text
BarnardWK_ICASSP_2013.pdf - ["content_typename_Accepted version (post-print)" not defined] Restricted to Repository staff only Available under License : See the attached licence file. Download (208kB) |
![]() |
Text (licence)
SRI_deposit_agreement.pdf Restricted to Repository staff only Available under License : See the attached licence file. Download (33kB) |
Abstract
Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have used visual information to model and recognise a subject's head in different configurations. These approaches have a number of limitations such as, inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a highly reverberant meeting room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This hypothesis is confirmed by experiments conducted on the publicly available AV16.3 database. © 2013 IEEE.
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Divisions : | Surrey research (other units) |
Authors : | Barnard, M, Wang, W and Kittler, J |
Date : | 18 October 2013 |
DOI : | 10.1109/ICASSP.2013.6639234 |
Depositing User : | Symplectic Elements |
Date Deposited : | 28 Mar 2017 15:53 |
Last Modified : | 23 Jan 2020 13:06 |
URI: | http://epubs.surrey.ac.uk/id/eprint/806078 |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year