University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Underdetermined model-based blind source separation of reverberant speech mixtures using spatial cues in a variational bayesian framework

Popa, V, Wang, W and Alinaghi, A (2013) Underdetermined model-based blind source separation of reverberant speech mixtures using spatial cues in a variational bayesian framework In: IET Intelligent Signal Processing Conference, 2013-12-02 - 2013-12-03, London.

[img]
Preview
Text
PopaWA_ISP_2013.pdf - ["content_typename_Accepted version (post-print)" not defined]
Available under License : See the attached licence file.

Download (334kB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

In this paper, we propose a new method for underdetermined blind source separation of reverberant speech mixtures by classifying each time-frequency (T-F) point of the mixtures according to a combined variational Bayesian model of spatial cues, under sparse signal representation assumption. We model the T-F observations by a variational mixture of circularly-symmetric complex-Gaussians. The spatial cues, e.g. interaural level difference (ILD), interaural phase difference (IPD) and mixing vector cues, are modelled by a variational mixture of Gaussians. We then establish appropriate conjugate prior distributions for the parameters of all the mixtures to create a variational Bayesian framework. Using the Bayesian approach we then iteratively estimate the hyper-parameters for the prior distributions by optimizing the variational posterior distribution. The main advantage of this approach is that no prior knowledge of the number of sources is needed, and it will be automatically determined by the algorithm. The proposed approach does not suffer from overfitting problem, as opposed to the Expectation-Maximization (EM) algorithm, therefore it is not sensitive to initializations.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Popa, VUNSPECIFIEDUNSPECIFIED
Wang, WUNSPECIFIEDUNSPECIFIED
Alinaghi, AUNSPECIFIEDUNSPECIFIED
Date : 2013
Identification Number : 10.1049/cp.2013.2074
Contributors :
ContributionNameEmailORCID
PublisherIET, UNSPECIFIEDUNSPECIFIED
Additional Information : This paper is a postprint of a paper submitted to and accepted for publication in IET Conference Publications and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IET Digital Library.
Depositing User : Symplectic Elements
Date Deposited : 30 Sep 2014 15:57
Last Modified : 01 Oct 2014 01:33
URI: http://epubs.surrey.ac.uk/id/eprint/806100

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800