University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Separation of underdetermined reverberant speech mixtures by monaural, binaural and statistical cue combination

Alinaghi, A, Jackson, PJB and Wang, W (2012) Separation of underdetermined reverberant speech mixtures by monaural, binaural and statistical cue combination

Available under License : See the attached licence file.

Download (73kB)
Text (licence)

Download (33kB)


Underdetermined reverberant speech separation is a challenging problem in source sep- aration that has received considerable attention in both computational auditory scene analysis (CASA) and blind source separation (BSS). Recent studies suggest that, in general, the performance of frequency domain BSS methods suffer from the permuta- tion problem across frequencies which degrades in high reverberation, meanwhile, CASA methods perform less effectively for closely spaced sources. This paper presents a method to address these limitations, based on the combination of monaural, binaural and BSS cues for the automatic classification of time-frequency (T-F) units of the speech mixture spectrogram. By modeling the interaural phase difference, the interaural level difference and frequency-bin mixing vectors, we integrate the coherence information for each source within a probabilistic framework. The Expectation-Maximization (EM) algorithm is then used iteratively to refine the soft assignment of TF regions to sources and re-estimate their model parameters. It is observed that the reliability of the cues affects the accu- racy of the estimates and varies with respect to cue type and frequency. As such, the contribution of each cue to the assignment decision is adjusted by weighting the log- likelihoods of the cues empirically, which significantly improves the performance. Results are reported for binaural speech mixtures in five rooms covering a range of reverberation times and direct-to-reverberant ratios. The proposed method compares favorably with state-of-the-art baseline algorithms by Mandel et al. and Sawada et al., in terms of signal- to-distortion ratio (SDR) of the separated source signals. The paper also investigates the effect of introducing spectral cues for integration within the same framework. Analysis of the experimental outcomes will include a comparison of the contribution of individual cues under varying conditions and discussion of the implications for system optimization.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Alinaghi, A
Jackson, PJB
Wang, W
Date : November 2012
Depositing User : Symplectic Elements
Date Deposited : 30 Oct 2013 09:50
Last Modified : 09 Jun 2014 13:43

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800