University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Video assisted speech source separation

Wang, W, Hicks, Y, Sanei, S, Chambers, J and Cosker, D (2005) Video assisted speech source separation In: ICASSP'05, 2005-03-18 - 2005-03-23, Philadelphia, USA.

[img] Text
Restricted to Repository staff only
Available under License : See the attached licence file.

Download (345kB)
[img] Text (licence)
Restricted to Repository staff only

Download (33kB)


In this paper we investigate the problem of integrating the complementary audio and visual modalities for speech separation. Rather than using independence criteria suggested in most blind source separation (BSS) systems, we use the visual feature from a video signal as additional information to optimize the unmixing matrix. We achieve this by using a statistical model characterizing the nonlinear coherence between audio and visual features as a separation criterion for both instantaneous and convolutive mixtures. We acquire the model by applying the Bayesian framework to the fused feature observations based on a training corporus. We point out several key exisiting challenges to the success of the system. Experimental results verify the proposed approach, which outperforms the audio only separation system in a noisy environment, and also provides a solution to the permutation problem.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions : Surrey research (other units)
Authors : Wang, W, Hicks, Y, Sanei, S, Chambers, J and Cosker, D
Date : 9 May 2005
DOI : 10.1109/ICASSP.2005.1416331
Contributors :
Depositing User : Symplectic Elements
Date Deposited : 28 Mar 2017 14:43
Last Modified : 23 Jan 2020 12:49

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800