University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Integrating binaural cues and blind source separation method for separating reverberant speech mixtures

Alinaghi, A, Wang, W and Jackson, PJB (2011) Integrating binaural cues and blind source separation method for separating reverberant speech mixtures IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 209 - 212.

[img]
Preview
PDF
AlinaghiWangJackson_ICASSP11.pdf - Accepted Version
Available under License : See the attached licence file.

Download (81kB)
[img] Plain Text (licence)
licence.txt

Download (1kB)

Abstract

This paper presents a new method for reverberant speech separation, based on the combination of binaural cues and blind source separation (BSS) for the automatic classification of the time-frequency (T-F) units of the speech mixture spectrogram. The main idea is to model interaural phase difference, interaural level difference and frequency bin-wise mixing vectors by Gaussian mixture models for each source and then evaluate that model at each T-F point and assign the units with high probability to that source. The model parameters and the assigned regions are refined iteratively using the Expectation-Maximization (EM) algorithm. The proposed method also addresses the permutation problem of the frequency domain BSS by initializing the mixing vectors for each frequency channel. The EM algorithm starts with binaural cues and after a few iterations the estimated probabilistic mask is used to initialize and re-estimate the mix- ing vector model parameters. We performed experiments on speech mixtures, and showed an average of about 0.8 dB improvement in signal-to-distortion (SDR) over the binaural-only baseline

Item Type: Article
Additional Information:

Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Divisions: Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Depositing User: Symplectic Elements
Date Deposited: 25 Nov 2011 15:06
Last Modified: 23 Sep 2013 18:50
URI: http://epubs.surrey.ac.uk/id/eprint/7722

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800