University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Audio-Visual Blind Source Separation.

Liu, Qingju. (2013) Audio-Visual Blind Source Separation. Doctoral thesis, University of Surrey (United Kingdom)..

[img]
Preview
Text
27606655.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (8MB) | Preview

Abstract

Humans with normal hearing ability are generally skilful in listening selectively to a particular speech signal in the presence of competing sounds and background noise, such as a “cocktail party environment”. It is however an extremely challenging task to replicate such capabilities with machines. Blind source separation (BSS) is a promising technique for addressing this problem, which aims to recover the unknown source signals from their mixtures without or with little knowledge about the source signals and the mixing process. Among the existing BSS approaches, independent component analysis (ICA) and time-frequency (TF) masking, are two popular choices for addressing the cocktail party problem, especially in a controlled environment, as these methods use few physically plausible assumptions about the sources and the mixing process. However, these algorithms are conducted mainly in the audio-domain, and their performance is limited by the acoustic distortions due to background noise and room reverberation, especially when such distortions become prominent. It is known that both speech production and perception are bimodal processes, involving intrinsic interactions between audition and vision. For instance lip-reading helps the listener to better understand the target speech in a noisy and reverberant environment with multiple competing speakers. This thesis therefore considers the research question: can the visual modality be useful for improving the performance of audio-domain BSS algorithms? To this end, two key challenges have been studied in this thesis. Firstly, a proper evaluation of the audio-visual (AV) relationship, i. e. robust AV coherence modelling that takes into account the cross-modality differences in size, sampling rate and dimensionality. To address this problem, a global method with feature-based statistical characterisation, as well as a local method with sparse audio-visual dictionary learning (AVDL) have been proposed. Secondly, fusion of the modelled AV coherence with audio-domain BSS for separation of reverberant, noisy and underdetermined mixtures. To address this problem methods such as coherence maximisation and constrained TF masking have been used. Three schematic AV-BSS algorithms have therefore been developed to implement these ideas. In the first method, we consider speech mixtures in a controlled environment with relatively short reverberation, where parallel ICAs are applied to the noisy convolutive speech mixtures in the frequency domain. The statistically characterised AV coherence is maximised to resolve the permutation problem associated with ICA. In the second method, room environments with a stronger level of reverberation are considered, where voice activity cues are integrated into a TF masking technique for interference reduction. The voice activity cues are detected from the video signals, which further enhance the audio-domain separation via a novel interference removal scheme. In the third method, instead of modelling voice activity, more explicit audio spectral information about the target speech is provided by the visual stream, through AVDL that exploits speech sparsity. The AV coherence modelled by AVDL is then used to constrain the TF masks, for separating reverberant and noisy speech mixtures acquired in real-room environments.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Liu, Qingju.
Date : 2013
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2013.
Depositing User : EPrints Services
Date Deposited : 06 May 2020 12:15
Last Modified : 06 May 2020 12:20
URI: http://epubs.surrey.ac.uk/id/eprint/855829

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800