University of Surrey

Test tubes in the lab Research in the ATI Dance Research

A Skip Attention Mechanism for Monaural Singing Voice Separation

Yuan, Weitao, Wang, Shengbei, Li, Xiangrui, Unoki, Masashi and Wang, Wenwu (2019) A Skip Attention Mechanism for Monaural Singing Voice Separation IEEE Signal Processing Letters, 26 (10). pp. 1481-1485.

A Skip Attention Mechanism for Monaural Singing Voice Separation.pdf - Accepted version Manuscript

Download (6MB) | Preview


This work proposes a simple but effective attention mechanism, namely Skip Attention (SA), for monaural singing voice separation (MSVS). First, the SA, embedded in the convolutional encoder-decoder network (CEDN), realizes an attention-driven and dependency modeling for the repetitive structures of the music source. Second, the SA, replacing the popular skip connection in the CEDN, effectively controls the flow of the low-level (vocal and musical) features to the output and improves the feature sensitivity and accuracy for MSVS. Finally, we implement the proposed SA on the Stacked Hourglass Network (SHN), namely Skip Attention SHN (SA-SHN). Quantitative and qualitative evaluation results have shown that the proposed SA-SHN achieves significant performance improvement on the MIR-1K dataset (compared to the state-of-the-art SHN) and competitive MSVS performance on the DSD100 dataset (compared to the state-of-the-art DenseNet), even without using any data augmentation methods.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Yuan, Weitao
Wang, Shengbei
Li, Xiangrui
Unoki, Masashi
Date : October 2019
DOI : 10.1109/LSP.2019.2935867
Copyright Disclaimer : © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Skip attention; Stacked hourglass network; Monaural singing voice separation; Music; Spectrogram; Feature extraction; Decoding; Time-frequency analysis; Convolution; Training
Depositing User : Clive Harris
Date Deposited : 10 Sep 2019 07:52
Last Modified : 28 Oct 2019 09:59

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800