University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor

Wang, D, Zou, Y and Wang, Wenwu (2017) Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor Journal of The Franklin Institute, 355 (4). pp. 1692-1709.

[img] Text
WangZW_2017_JFI.pdf - Accepted version Manuscript

Download (891kB)


Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction-of-arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time-frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared with the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Wang, D
Zou, Y
Date : 12 May 2017
Funders : Engineering and Physical Sciences Research Council (EPSRC)
DOI : 10.1016/j.jfranklin.2017.05.002
Copyright Disclaimer : © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Uncontrolled Keywords : DOA estimation; Tandem local spectrogram block; Soft mask; Deep neural network; Support vector machine
Depositing User : Melanie Hughes
Date Deposited : 24 May 2017 10:06
Last Modified : 11 Dec 2018 11:23

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800