University of Surrey

Test tubes in the lab Research in the ATI Dance Research

LD-CNN: A Lightweight Dilated Convolutional Neural Network for Environmental Sound Classification

Zhang, Xiaohu, Zou, Yuexian and Wang, Wenwu (2018) LD-CNN: A Lightweight Dilated Convolutional Neural Network for Environmental Sound Classification In: International Conference on Pattern Recognition (ICPR 2018), 20-24 Aug 2018, Beijing, China.

[img]
Preview
Text
LD-CNN.pdf - Accepted version Manuscript

Download (517kB) | Preview

Abstract

Environmental Sound Classification (ESC) plays a vital role in machine auditory scene perception. Deep learning based ESC methods, such as the Dilated Convolutional Neural Network (D-CNN), have achieved the state-of-art results on public datasets. However, the D-CNN ESC model size is often larger than 100MB and is only suitable for the systems with powerful GPUs, which prevent their applications in handheld devices. In this study, we take the D-CNN ESC framework and focus on reducing the model size while maintaining the ESC performance. As a result, a lightweight D-CNN (termed as LDCNN) ESC system is developed. Our work lies on twofold. First, we propose to reduce the number of parameters in the convolution layers by factorizing a two-dimensional convolution filters (L ×W) to two separable one-dimensional convolution filters (L×1 and 1×W). Second, we propose to replace the first fully connection layer (FCL) by a Feature Sum layer (FSL) to further reduce the number of parameters. This is motivated by our finding that the features of the environmental sounds have weak absolute locality property and a global sum operation can be applied to compress the feature map. Experiments on three public datasets (ESC50, UrbanSound8K, and CICESE) show that the proposed system offers comparable classification performance but with a much smaller model size. For example, the model size of our proposed system is about 2.05MB, which is 50 times smaller than the original D-CNN model, but at a loss of only 1%-2% classification accuracy.

Item Type: Conference or Workshop Item (Conference Poster)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Zhang, Xiaohu
Zou, Yuexian
Wang, WenwuW.Wang@surrey.ac.uk
Date : 20 August 2018
Copyright Disclaimer : Copyright 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Environmental Sound Classification; Convolutional Neural Network; Lightweight Dilated Convolutional Neural Network; Spatial Factorization Convolution Layer; FeatureSum Layer
Related URLs :
Depositing User : Clive Harris
Date Deposited : 19 Sep 2018 09:07
Last Modified : 19 Sep 2018 09:15
URI: http://epubs.surrey.ac.uk/id/eprint/849351

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800