University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Sound event detection with weakly labelled data

Kong, Qiuqiang (2020) Sound event detection with weakly labelled data Doctoral thesis, University of Surrey.

[img]
Preview
Text
PhD thesis Sound Event Detection with Weakly Labelled Data_v2.0.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (5MB) | Preview

Abstract

Sound event detection (SED) is a problem to detect the onset and offset times of sound events in an audio recording. SED has many applications in both academia and industry, such as multimedia information retrieval and monitoring domestic and public security. However, compared to speech signal processing that have been researched for many years, the classification and detection of general sounds has not been researched much until recent years. One limitation of the study on audio classification and sound event detection is that there have been limited datasets public available until the release of the release of the detection and classification of acoustic scenes and events (DCASE) dataset. The DCASE dataset consists of data for acoustic scene classification (ASC), audio tagging (AT) and sound event detection. ASC and AT are tasks to design systems to predict pre-defined labels in an audio clip. SED is a task to design systems to predict both the presence or absence of sound events in an audio clip as well as the onset and offset times of the sound events. One difficulty of the audio classification and SED task is that many datasets such as the DCASE dataset are weakly labelled. That is, only the presence or absence of sound events in an audio clip is known, without knowing the onset and offset annotations of the sound events. This thesis focused on solving the audio tagging and sound event detection problem using only weakly labelled data. This thesis proposed attention neural networks to solve the general weakly labelled AT and SED problem. The attention neural networks can automatically learn to attend to important segments and ignore silence and irrelevant segments in an audio clip. We developed a set of weak learning methods for AT and SED using attention neu- Abstract 3 ral networks. The proposed methods have achieved a state-of-the-art performance in audio tagging and sound event detection.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Kong, Qiuqiang
Date : 28 February 2020
Funders : China Scholarship Council (CSC)
DOI : 10.15126/thesis.00853328
Contributors :
ContributionNameEmailORCID
http://www.loc.gov/loc.terms/relators/THSPlumbley, Markm.plumbley@surrey.ac.uk
Depositing User : Qiuqiang Kong
Date Deposited : 06 Mar 2020 15:16
Last Modified : 06 Mar 2020 15:16
URI: http://epubs.surrey.ac.uk/id/eprint/853328

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800