University of Surrey

Test tubes in the lab Research in the ATI Dance Research

A Speech Synthesis Approach for High Quality Speech Separation and Generation

Liu, Qingju, Jackson, Philip and Wang, Wenwu (2019) A Speech Synthesis Approach for High Quality Speech Separation and Generation IEEE Signal Processing Letters.

SynthesisForSeparation.pdf - Accepted version Manuscript

Download (1MB) | Preview


We propose a new method for source separation by synthesizing the source from a speech mixture corrupted by various environmental noise. Unlike traditional source separation methods which estimate the source from the mixture as a replica of the original source (e.g. by solving an inverse problem), our proposed method is a synthesis-based approach which aims to generate a new signal (i.e. “fake” source) that sounds similar to the original source. The proposed system has an encoder-decoder topology, where the encoder predicts intermediate-level features from the mixture, i.e. Mel-spectrum of the target source, using a hybrid recurrent and hourglass network, while the decoder is a state-of-the-art WaveNet speech synthesis network conditioned on the Mel-spectrum, which directly generates time-domain samples of the sources. Both objective and subjective evaluations were performed on the synthesized sources, and show great advantages of our proposed method for high-quality speech source separation and generation.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
Date : 6 November 2019
Funders : EPSRC - Engineering and Physical Sciences Research Council, BBC Audio Research Partnership
DOI : 10.1109/LSP.2019.2951894
Copyright Disclaimer : 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Deep learning; Speech separation; Speech synthesis; WaveNet; Hourglass; High quality.
Depositing User : Diane Maxfield
Date Deposited : 19 Nov 2019 14:45
Last Modified : 19 Nov 2019 14:45

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800