University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

Chen, X, Wang, W, Wang, Y, Zhong, X and Alinaghi, A (2015) Reverberant speech separation with probabilistic time-frequency masking for B-format recordings Speech Communication, 68. pp. 41-54.

[img]
Preview
Text
ChenWWZA_SpeechComm_2015_postprint.pdf - Accepted version Manuscript
Available under License : See the attached licence file.

Download (1MB) | Preview
[img]
Preview
Text (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

© 2015 Elsevier B.V. All rights reserved. Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time-frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Chen, XUNSPECIFIEDUNSPECIFIED
Wang, WUNSPECIFIEDUNSPECIFIED
Wang, YUNSPECIFIEDUNSPECIFIED
Zhong, XUNSPECIFIEDUNSPECIFIED
Alinaghi, AUNSPECIFIEDUNSPECIFIED
Date : 22 January 2015
Identification Number : 10.1016/j.specom.2015.01.002
Additional Information : © 2015. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Depositing User : Symplectic Elements
Date Deposited : 18 Nov 2015 15:04
Last Modified : 22 Jun 2016 01:08
URI: http://epubs.surrey.ac.uk/id/eprint/809041

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800