University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Acoustic vector sensor based reverberant speech separation with probabilistic time-frequency masking

Zhong, X, Premkumar, AB, Chen, X, Wang, W and Alinaghi, A (2013) Acoustic vector sensor based reverberant speech separation with probabilistic time-frequency masking In: 21st European Signal Processing Conference, 2013-09-09 - 2013-09-13, Marrakech.

[img]
Preview
Text
ZhongCWAP_EUSIPCO_2013.pdf - ["content_typename_Accepted version (post-print)" not defined]
Available under License : See the attached licence file.

Download (282kB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

Most existing speech source separation algorithms have been developed for separating sound mixtures acquired by using a conventional microphone array. In contrast, little attention has been paid to the problem of source separation using an acoustic vector sensor (AVS). We propose a new method for the separation of convolutive mixtures by incorporating the intensity vector of the acoustic field, obtained using spatially co-located microphones which carry the direction of arrival (DOA) information. The DOA cues from the intensity vector, together with the frequency bin-wise mixing vector cues, are then used to determine the probability of each time-frequency (T-F) point of the mixture being dominated by a specific source, based on the Gaussian mixture models (GMM), whose parameters are evaluated and refined iteratively using an expectation-maximization (EM) algorithm. Finally, the probability is used to derive the T-F masks for recovering the sources. The proposed method is evaluated in simulated reverberant environments in terms of signal-to-distortion ratio (SDR), giving an average improvement of approximately 1:5 dB as compared with a related T-F mask approach based on a conventional microphone setting. © 2013 EURASIP.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Zhong, XUNSPECIFIEDUNSPECIFIED
Premkumar, ABUNSPECIFIEDUNSPECIFIED
Chen, XUNSPECIFIEDUNSPECIFIED
Wang, WUNSPECIFIEDUNSPECIFIED
Alinaghi, AUNSPECIFIEDUNSPECIFIED
Date : 2013
Contributors :
ContributionNameEmailORCID
PublisherIEEE, UNSPECIFIEDUNSPECIFIED
Additional Information : © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Depositing User : Symplectic Elements
Date Deposited : 30 Sep 2014 15:37
Last Modified : 01 Oct 2014 01:33
URI: http://epubs.surrey.ac.uk/id/eprint/806096

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800