University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Polyphonic sound event detection and localization using a two-stage strategy

Cao, Yin, Kong, Qiuqiang, Iqbal, Turab, An, Fengyan, Wang, Wenwu and Plumbley, Mark D. (2019) Polyphonic sound event detection and localization using a two-stage strategy In: Detection and Classification of Acoustic Scenes and Events (DCASE 2019) Workshop, 25-26 Oct 2019, New York University's Tandon School of Engineering, Brooklyn, NY, USA.

Polyphonic sound event detection and localization using a two-stage strategy.pdf - Accepted version Manuscript

Download (535kB) | Preview


Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. Using neural networks has become the prevailing method for SED. In the area of sound localization, which is usually performed by estimating the direction of arrival (DOA), learning-based methods have recently been developed. In this paper, it is experimentally shown that the trained SED model is able to contribute to the direction of arrival estimation (DOAE). However, joint training of SED and DOAE degrades the performance of both. Based on these results, a two-stage polyphonic sound event detection and localization method is proposed. The method learns SED first, after which the learned feature layers are transferred for DOAE. It then uses the SED ground truth as a mask to train DOAE. The proposed method is evaluated on the DCASE 2019 Task 3 dataset, which contains different overlapping sound events in different environments. Experimental results show that the proposed method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.

Item Type: Conference or Workshop Item (Conference Poster)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
An, Fengyan
Plumbley, Mark
Editors :
Mandel, Michael
Salamon, Justin
Ellis, Daniel P. W
Date : 25 October 2019
Funders : Engineering and Physical Sciences Research Council (EPSRC), European Union's Horizon 2020
DOI : 10.33682/4jhy-bj81
Grant Title : Making Sense of Sounds
Copyright Disclaimer : Copyright 2019 The Authors This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit:
Uncontrolled Keywords : Sound event detection; Source localization; Direction of arrival; Convolutional recurrent neural networks
Related URLs :
Additional Information : Proceedings DOI:
Depositing User : Clive Harris
Date Deposited : 05 Nov 2019 13:54
Last Modified : 14 Nov 2019 13:45

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800