University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Surrey-CVSSP system for DCASE2017 challenge task 4

Xu, Yong, Kong, Qiuqiang, Wang, Wenwu and Plumbley, Mark (2017) Surrey-CVSSP system for DCASE2017 challenge task 4 In: Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 16-17 Nov 2017, Munich, Germany.

Surrey-CVSSP system for DCASE2017 challenge task4.pdf - Accepted version Manuscript

Download (138kB) | Preview


In this technique report, we present a bunch of methods for the task 4 of Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are YouTube video excerpts focusing on transportation and warnings due to their industry applications. There are two tasks, audio tagging and sound event detection from weakly labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU) based recurrent neural network (RNN) are adopted as our basic framework. We proposed a learnable gating activation function for selecting informative local features. Attention-based scheme is used for localizing the specific events in a weakly-supervised mode. A new batch-level balancing strategy is also proposed to tackle the data unbalancing problem. Fusion of posteriors from different systems are found effective to improve the performance. In a summary, we get 61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the sound event detection subtask on the development set. While the official multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the audio tagging and 1.02 for the sound event detection.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Editors :
Virtanen, Tuomas
Mesaros, Annamaria
Heittola, Toni
Diment, Aleksandr
Vincent, Emmanuel
Benetos, Emmanouil
Martinez Elizalde, Benjamin
Date : 6 November 2017
Funders : Engineering and Physical Sciences Research Council (EPSRC)
Copyright Disclaimer : This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Uncontrolled Keywords : DCASE2017; Convolutional neural network; Attention; Audio tagging; Sound event detection; Weakly labelled data
Related URLs :
Depositing User : Clive Harris
Date Deposited : 30 Nov 2017 15:12
Last Modified : 11 Dec 2018 11:23

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800