University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Use of bimodal coherence to resolve the permutation problem in convolutive BSS

Liu, Q, Wang, W and Jackson, P (2012) Use of bimodal coherence to resolve the permutation problem in convolutive BSS Signal Processing, 92 (8). pp. 1916-1927.

[img]
Preview
Text
LiuWJ_SigProc_2012.pdf
Available under License : See the attached licence file.

Download (1MB)
[img]
Preview
Text (licence)
SRI_deposit_agreement.pdf

Download (33kB)

Abstract

Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With the synchronized features, we propose an adapted expectation maximization (AEM) algorithm to model the audiovisual coherence in the off-line training process. To improve the accuracy of this coherence model, we use a frame selection scheme to discard nonstationary features. Then with the coherence maximization technique, we develop a new sorting method to solve the permutation problem in the frequency domain. We test our algorithm on a multimodal speech database composed of different combinations of vowels and consonants. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS, which confirms the benefit of using visual speech to assist in separation of the audio. © 2011 Elsevier B.V. All rights reserved.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Liu, QUNSPECIFIEDUNSPECIFIED
Wang, WUNSPECIFIEDUNSPECIFIED
Jackson, PUNSPECIFIEDUNSPECIFIED
Date : August 2012
Identification Number : 10.1016/j.sigpro.2011.11.007
Additional Information : NOTICE: this is the preprint version of a work that was submitted and subsequently accepted for publication in Signal Processing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Signal Processing, 92(8), August 2012, DOI 10.1016/j.sigpro.2011.11.007.
Depositing User : Symplectic Elements
Date Deposited : 19 Feb 2013 17:27
Last Modified : 09 Jun 2014 13:45
URI: http://epubs.surrey.ac.uk/id/eprint/596081

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800