University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Stowell, D and Plumbley, MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning PeerJ, 2, e488.

[img]
Preview
Text
1405.6524v1.pdf - ["content_typename_Submitted version (pre-print)" not defined]
Available under License : See the attached licence file.

Download (1MB) | Preview
[img]
Preview
Text
Stowell14-peerj.pdf - ["content_typename_Published version (Publisher's proof or final PDF)" not defined]
Available under License : See the attached licence file.

Download (4MB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Stowell, DUNSPECIFIEDUNSPECIFIED
Plumbley, MDUNSPECIFIEDUNSPECIFIED
Date : 17 July 2014
Identification Number : 10.7717/peerj.488
Uncontrolled Keywords : Bioacoustics, Birds, Birdsong, Classification, Machine learning, Vocalisation
Related URLs :
Additional Information : PeerJ is an Open Access journal.
Depositing User : Symplectic Elements
Date Deposited : 12 May 2015 10:09
Last Modified : 03 Nov 2015 15:44
URI: http://epubs.surrey.ac.uk/id/eprint/807401

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800