University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Attention and Localization based on a Deep Convolutional Recurrent Model forWeakly Supervised Audio Tagging

Xu, Yong, Kong, Qiuqiang, Huang, Qiang, Wang, Wenwu and Plumbley, Mark (2017) Attention and Localization based on a Deep Convolutional Recurrent Model forWeakly Supervised Audio Tagging In: Interspeech 2017, 20 - 24 August 2017, Stockholm, Sweden.

[img]
Preview
Text
IS2017_att_loc_at_final.pdf - Accepted version Manuscript

Download (652kB) | Preview

Abstract

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Xu, Yongyong.xu@surrey.ac.ukUNSPECIFIED
Kong, Qiuqiangq.kong@surrey.ac.ukUNSPECIFIED
Huang, Qiangq.huang@surrey.ac.ukUNSPECIFIED
Wang, WenwuW.Wang@surrey.ac.ukUNSPECIFIED
Plumbley, Markm.plumbley@surrey.ac.ukUNSPECIFIED
Date : 24 August 2017
Identification Number : 10.21437/Interspeech.2017-486
Copyright Disclaimer : Copyright 2017 by ISCA (the International Speech Communication Association)
Uncontrolled Keywords : audio tagging, attention model, DCASE 2016 challenge, convolutional recurrent model
Depositing User : Melanie Hughes
Date Deposited : 30 May 2017 16:14
Last Modified : 24 Aug 2017 08:03
URI: http://epubs.surrey.ac.uk/id/eprint/841234

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800