Improving the perceptual quality of single-channel blind audio source separation.
Stokes, Tobias W. (2015) Improving the perceptual quality of single-channel blind audio source separation. Doctoral thesis, University of Surrey.
|
Text
TobyStokesThesis.pdf - Thesis (version of record) Available under License Creative Commons Attribution Non-commercial Share Alike. Download (2MB) | Preview |
|
![]() |
Text
2014_08_13_Author_Deposit_Agreement.docx - Thesis (version of record) Available under License Creative Commons Attribution Non-commercial Share Alike. Download (42kB) |
Abstract
Given a mixture of audio sources, a blind audio source separation (BASS) tool is required to extract audio relating to one specific source whilst attenuating that related to all others. This thesis answers the question “How can the perceptual quality of BASS be improved for broadcasting applications?” The most common source separation scenario, particularly in the field of broadcasting, is single channel, and this is particularly challenging as a limited set of cues are available. Broadcasting also requires that a source separator is automated, capable of handling non-stationary, reverberant mixtures and able to separate an unknown number of sources. In the single-channel case, the time- frequency mask is common as a method of separation. However, this process produces artefacts in the separated audio. The perceptual evaluation for audio source separation (PEASS) toolkit represents an efficient way to generate a multi-dimensional measure of perceptual quality. Initial experimental work, using ideal target and interferer estimates, uses PEASS to test variations on the ideal binary mask and shows continuous masks are perceptually better than binary while identifying a trade-off between artefacts and interferer suppression. To explore the optimisation of this trade-off, a series of sigmoidal functions are used to map target-to-mixture ratios to mask coefficients. This leads to a mask, with less target-to-mixture based discrimination than those typically found in literature, being identified as the optimum. Further experiments applying offsets, hysteresis, smoothing and frequency-dependency to the mask do not show any benefit in audio quality. The optimal sigmoidal mask is demonstrated to also be superior under non-ideal conditions using a non-negative matrix factorisation algorithm to produce the estimates. A final listening test compares the outputs of binary, ratio and optimal sigmoidal masks concluding that listeners prefer the ratio mask to the sigmoidal mask and both continuous masks to the binary mask.
Item Type: | Thesis (Doctoral) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Divisions : | Theses | ||||||||||||
Authors : |
|
||||||||||||
Date : | 30 June 2015 | ||||||||||||
Funders : | Engineering and Physical Sciences Research Council, British Broadcasting Corporation Research and Development | ||||||||||||
Contributors : |
|
||||||||||||
Depositing User : | Tobias Stokes | ||||||||||||
Date Deposited : | 07 Jul 2015 07:49 | ||||||||||||
Last Modified : | 06 Jul 2019 05:14 | ||||||||||||
URI: | http://epubs.surrey.ac.uk/id/eprint/807786 |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year