University of Surrey

Test tubes in the lab Research in the ATI Dance Research


Ren, Zhao, Kong, Qiuqiang, Han, Jing, Plumbley, Mark and Schuller, Björn W (2019) ATTENTION-BASED ATROUS CONVOLUTIONAL NEURAL NETWORKS: VISUALISATION AND UNDERSTANDING PERSPECTIVES OF ACOUSTIC SCENES In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), 2019-05-12 - 2019-05-17, Brighton, UK.

RenKongHanPS19-icassp.pdf - Accepted version Manuscript

Download (657kB) | Preview


The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio waveform has been recorded. Recently, deep neural networks have been applied to ASC and have achieved state-of-the-art performance. However, few works have investigated how to visualise and understand what a neural network has learnt from acoustic scenes. Previous work applied local pooling after each convolutional layer, therefore reduced the size of the feature maps. In this paper, we suggest that local pooling is not necessary, but the size of the receptive field is important. We apply atrous Convolutional Neural Networks (CNNs) with global attention pooling as the classification model. The internal feature maps of the attention model can be visualised and explained. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 dataset, our proposed method achieves an accuracy of 72.7 %, significantly outperforming the CNNs without dilation at 60.4 %. Furthermore, our results demonstrate that the learnt feature maps contain rich information on acoustic scenes in the time-frequency domain.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
Ren, Zhao
Han, Jing
Schuller, Björn W
Date : 1 February 2019
Funders : European Union’s Horizon H2020, EPSRC - Engineering and Physical Sciences Research Council, China Scholarship Council (CSC)
Copyright Disclaimer : © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Deep neural networks; Atrous convolutional neural networks; Attention pooling; Acoustic scene classification
Related URLs :
Depositing User : Diane Maxfield
Date Deposited : 27 Feb 2019 17:12
Last Modified : 28 Feb 2019 10:03

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800