University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Blind convolutive stereo speech separation and dereverberation.

Alinaghi, Atiyeh (2017) Blind convolutive stereo speech separation and dereverberation. Doctoral thesis, University of Surrey.

thesis.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (2MB) | Preview


In real environments, microphones record not only the target speech signal, but also other available sources, the room acoustic effects and background noise. Hence, for many applications, including automatic speech recognition, hearing aids, cochlear implants, and human machine interaction, it is desirable to extract the target speech from the noisy convolutive mixture of multiple sources. There are two major approaches for speech source separation. One group of algorithms is known as blind source separation (BSS) and is based on the statistical properties of the signals. The other group is known as computational auditory scene analysis (CASA) which is inspired by the human auditory system. Using either approach, a voice may be extracted by applying a mask to a time-frequency representation of the noisy reverberant mixted signal. In this thesis, these two groups of techniques are studied, compared and combined based on two state of the art algorithms. For the BSS approach, a frequency-dependent mixing vector (MV) is estimated and exploited to form a probabilistic mask. In the CASA approach, binaural cues such as interaural time difference (ITD) and interaural level difference (ILD) are calculated and applied to estimate a different probabilistic mask. Since the BSS approach performance is poor in high reverberation and CASA approach fails to separate the sources close to each other, experiments were conducted to test to their combination. The results show significant improvement in source separation under various conditions. However, the mechanism for this improvement was not clear at the first glance. The methods are studied and show that the MV based algorithm works better when the sources are close to each other. On the other hand, binaural cues yield better performance in the presence of reverberation. Consequently, these two major approaches give complementary improvements under adverse conditions. High reverberation still degrades the performance of our source separation algorithm. Therefore, the precedence effect was considered as a means to tackle reverberation. In our algorithm, time-frequency regions dominated by direct sound are identified based on the interaural coherence. The results demonstrate a further significant improvement in performance.

Item Type: Thesis (Doctoral)
Subjects : blind source separation (BSS), computational auditory scene analysis (CASA), mixing vector (MV), binaural cues, convolutive mixtures, probabilistic model, soft mask, precedence effect, speech enhancement, reverberation.
Divisions : Theses
Authors :
Alinaghi, Atiyeh
Date : 31 March 2017
Funders : Centre for Vision, Speech and Signal Processing (CVSSP)
Contributors :
ContributionNameEmailORCID, P.J., W.
Depositing User : Atiyeh Alinaghi
Date Deposited : 28 Mar 2017 08:09
Last Modified : 31 Oct 2017 19:11

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800