University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Using Audio-Visual Features for Robust Voice Activity Detection in Clean and Noisy Speech

Almajai, I and Milner, B Using Audio-Visual Features for Robust Voice Activity Detection in Clean and Noisy Speech In: The European Signal Processing Conference, 2008-08-25 - ?, Geneva, Switzerland.

Full text not available from this repository.

Abstract

The aim of this work is to utilize both audio and visual speech information to create a robust voice activity detector (VAD) that operates in both clean and noisy speech. A statistical-based audio-only VAD is developed first using MFCC vectors as input. Secondly, a visual-only VAD is produced which uses 2-D discrete cosine transform (DCT) visual features. The two VADs are then integrated into an audio-visual VAD (AV-VAD). A weighting term is introduced to vary the contribution of the audio and visual components according to the input signal-to-noise ratio (SNR). Experimental results first establish the optimal configuration of the classifier and show that higher accuracy is obtained when temporal derivatives are included. Tests in white noise down to an SNR of -20dB show the AV-VAD to be highly robust with accuracy remaining above 97%. Comparison with the ETSI Aurora VAD shows the AV-VAD to be significantly more accurate.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Authors :
NameEmailORCID
Almajai, Ii.almajai@surrey.ac.ukUNSPECIFIED
Milner, BUNSPECIFIEDUNSPECIFIED
Depositing User : Symplectic Elements
Date Deposited : 17 May 2017 11:56
Last Modified : 17 May 2017 11:56
URI: http://epubs.surrey.ac.uk/id/eprint/833320

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800