University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Acoustic scene generation with conditional SampleRNN

Kong, Qiuqiang, Xu, Yong, Iqbal, Turab, Cao, Yin, Wang, Wenwu and Plumbley, Mark D. (2019) Acoustic scene generation with conditional SampleRNN In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), 12-17 May 2019, Brighton, UK.

[img]
Preview
Text
Acoustic scene generation with conditional SampleRNN.pdf - Accepted version Manuscript

Download (563kB) | Preview

Abstract

Acoustic scene generation (ASG) is a task to generate waveforms for acoustic scenes. ASG can be used to generate audio scenes for movies and computer games. Recently, neural networks such as SampleRNN have been used for speech and music generation. However, ASG is more challenging due to its wide variety. In addition, evaluating a generative model is also difficult. In this paper, we propose to use a conditional SampleRNN model to generate acoustic scenes conditioned on the input classes. We also propose objective criteria to evaluate the quality and diversity of the generated samples based on classification accuracy. The experiments on the DCASE 2016 Task 1 acoustic scene data show that with the generated audio samples, a classification accuracy of 65:5% can be achieved compared to samples generated by a random model of 6:7% and samples from real recording of 83:1%. The performance of a classifier trained only on generated samples achieves an accuracy of 51:3%, as opposed to an accuracy of 6:7% with samples generated by a random model.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Kong, Qiuqiangq.kong@surrey.ac.uk
Xu, Yongyong.xu@surrey.ac.uk
Iqbal, Turabt.iqbal@surrey.ac.uk
Cao, Yinyin.cao@surrey.ac.uk
Wang, WenwuW.Wang@surrey.ac.uk
Plumbley, Mark D.m.plumbley@surrey.ac.uk
Date : 2019
Funders : Engineering and Physical Sciences Research Council (EPSRC)
Grant Title : Making Sense of Sounds
Copyright Disclaimer : © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Acoustic scene generation; SampleRNN; Recurrent neural network; Generative model
Related URLs :
Depositing User : Clive Harris
Date Deposited : 20 Mar 2019 09:21
Last Modified : 20 Mar 2019 10:23
URI: http://epubs.surrey.ac.uk/id/eprint/850808

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800