University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Kong, Qiuqiang, Xu, Yong, Wang, Wenwu and Plumbley, Mark (2020) Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization IEEE/ACM Transactions on Audio, Speech and Language Processing, 28. pp. 2450-2460.

KongXuWangPlumbley20-aslp_accepted.pdf - Accepted version Manuscript

Download (1MB) | Preview


Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN). Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on thresholds, such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to metrics that depends on those thresholds. Our proposed automatic threshold optimization system achieves a state-of-the-art audio tagging F1 of 0.646, outperforming that without threshold optimization of 0.629, and a sound event detection F1 of 0.584, outperforming that without threshold optimization of 0.564.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Kong, Qiuqiang
Xu, Yong
Date : 12 August 2020
Funders : Engineering and Physical Sciences Research Council (EPSRC), China Scholarship Council, EPSRC Doctoral Training Partnership
DOI : 10.1109/TASLP.2020.3014737
Grant Title : EPSRC Grant
Copyright Disclaimer : © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Sound event detection (SED); Weakly labelled data; Automatic threshold optimization; Task analysis; Training; Feature extraction; Shape; Event detection; Spectrogram; Logic gates
Additional Information : Embargo OK. No further action.
Depositing User : James Marshall
Date Deposited : 06 Aug 2020 14:41
Last Modified : 14 Sep 2020 13:22

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800