University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Improving the perceptual quality of single-channel blind audio source separation.

Stokes, Tobias W. (2015) Improving the perceptual quality of single-channel blind audio source separation. Doctoral thesis, University of Surrey.

[img]
Preview
Text
TobyStokesThesis.pdf - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (2MB) | Preview
[img] Text
2014_08_13_Author_Deposit_Agreement.docx - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (42kB)

Abstract

Given a mixture of audio sources, a blind audio source separation (BASS) tool is required to extract audio relating to one specific source whilst attenuating that related to all others. This thesis answers the question “How can the perceptual quality of BASS be improved for broadcasting applications?” The most common source separation scenario, particularly in the field of broadcasting, is single channel, and this is particularly challenging as a limited set of cues are available. Broadcasting also requires that a source separator is automated, capable of handling non-stationary, reverberant mixtures and able to separate an unknown number of sources. In the single-channel case, the time- frequency mask is common as a method of separation. However, this process produces artefacts in the separated audio. The perceptual evaluation for audio source separation (PEASS) toolkit represents an efficient way to generate a multi-dimensional measure of perceptual quality. Initial experimental work, using ideal target and interferer estimates, uses PEASS to test variations on the ideal binary mask and shows continuous masks are perceptually better than binary while identifying a trade-off between artefacts and interferer suppression. To explore the optimisation of this trade-off, a series of sigmoidal functions are used to map target-to-mixture ratios to mask coefficients. This leads to a mask, with less target-to-mixture based discrimination than those typically found in literature, being identified as the optimum. Further experiments applying offsets, hysteresis, smoothing and frequency-dependency to the mask do not show any benefit in audio quality. The optimal sigmoidal mask is demonstrated to also be superior under non-ideal conditions using a non-negative matrix factorisation algorithm to produce the estimates. A final listening test compares the outputs of binary, ratio and optimal sigmoidal masks concluding that listeners prefer the ratio mask to the sigmoidal mask and both continuous masks to the binary mask.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
AuthorsEmailORCID
Stokes, Tobias W.tobywstokes@gmail.comUNSPECIFIED
Date : 30 June 2015
Funders : Engineering and Physical Sciences Research Council, British Broadcasting Corporation Research and Development
Contributors :
ContributionNameEmailORCID
Thesis supervisorBrookes, TimUNSPECIFIEDUNSPECIFIED
Thesis supervisorHummersone, CUNSPECIFIEDUNSPECIFIED
Depositing User : Tobias Stokes
Date Deposited : 07 Jul 2015 07:49
Last Modified : 07 Jul 2015 07:49
URI: http://epubs.surrey.ac.uk/id/eprint/807786

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800