University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Blind source separation and visual voice activity detection for target speech extraction

Liu, Q and Wang, W (2011) Blind source separation and visual voice activity detection for target speech extraction In: iCAST 2011, 2011-09-27 - 2011-09-30, Dalian, China.

[img]
Preview
PDF
LiuW_iCAST_2011.pdf
Available under License : See the attached licence file.

Download (1MB)
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf

Download (33kB)

Abstract

Despite being studied extensively, the performance of blind source separation (BSS) is still limited especially for the sensor data collected in adverse environments. Recent studies show that such an issue can be mitigated by incorporating multimodal information into the BSS process. In this paper, we propose a method for the enhancement of the target speech separated by a BSS algorithm from sound mixtures, using visual voice activity detection (VAD) and spectral subtraction. First, a classifier for visual VAD is formed in the off-line training stage, using labelled features extracted from the visual stimuli. Then we use this visual VAD classifier to detect the voice activity of the target speech. Finally we apply a multi-band spectral subtraction algorithm to enhance the BSS-separated speech signal based on the detected voice activity. We have tested our algorithm on the mixtures generated artificially by the mixing filters with different reverberation times, and the results show that our algorithm improves the quality of the separated target signal. © 2011 IEEE.

Item Type: Conference or Workshop Item (Paper)
Additional Information:

Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Divisions: Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Depositing User: Symplectic Elements
Date Deposited: 11 Oct 2012 12:23
Last Modified: 23 Sep 2013 19:29
URI: http://epubs.surrey.ac.uk/id/eprint/596095

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800