University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

Xu, Yong, Kong, Qiuqiang, Huang, Qiang, Wang, Wenwu and Plumbley, Mark (2017) Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging In: The 2017 International Joint Conference on Neural Networks (IJCNN 2017), 2017-05-14 - 2017-05-19, Anchorage, Alaska.

[img]
Preview
Text
PID4655635_ijcnn_yong_v2.pdf - Accepted version Manuscript
Available under License : See the attached licence file.

Download (1MB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.

Item Type: Conference or Workshop Item (Conference Paper)
Subjects : Electronic Engineering
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Xu, Yongyong.xu@surrey.ac.ukUNSPECIFIED
Kong, Qiuqiangq.kong@surrey.ac.ukUNSPECIFIED
Huang, Qiangq.huang@surrey.ac.ukUNSPECIFIED
Wang, WenwuW.Wang@surrey.ac.ukUNSPECIFIED
Plumbley, Markm.plumbley@surrey.ac.ukUNSPECIFIED
Date : 2017
Copyright Disclaimer : © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Contributors :
ContributionNameEmailORCID
UNSPECIFIEDIEEE, UNSPECIFIEDUNSPECIFIED
Related URLs :
Depositing User : Symplectic Elements
Date Deposited : 24 Feb 2017 11:55
Last Modified : 19 Jul 2017 11:11
URI: http://epubs.surrey.ac.uk/id/eprint/813631

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800