University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

Liu, Qingju, Wang, Wenwu, de Campos, Teofilo, Jackson, Philip and Hilton, Adrian (2017) Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion IEEE Transactions on Multimedia, 20 (7). pp. 1767-1780.

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion.pdf - Author's Original

Download (4MB) | Preview


In object-based spatial audio system, positions of the audio objects (e.g. speakers/talkers or voices) presented in the sound scene are required as important metadata attributes for object acquisition and reproduction. Binaural microphones are often used as a physical device to mimic human hearing and to monitor and analyse the scene, including localisation and tracking of multiple speakers. The binaural audio tracker, however, is usually prone to the errors caused by room reverberation and background noise. To address this limitation, we present a multimodal tracking method by fusing the binaural audio with depth information (from a depth sensor, e.g., Kinect). More specifically, the PHD filtering framework is first applied to the depth stream, and a novel clutter intensity model is proposed to improve the robustness of the PHD filter when an object is occluded either by other objects or due to the limited field of view of the depth sensor. To compensate mis-detections in the depth stream, a novel gap filling technique is presented to map audio azimuths obtained from the binaural audio tracker to 3D positions, using speaker-dependent spatial constraints learned from the depth stream. With our proposed method, both the errors in the binaural tracker and the mis-detections in the depth tracker can be significantly reduced. Real-room recordings are used to show the improved performance of the proposed method in removing outliers and reducing mis-detections.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
de Campos,
Date : 24 November 2017
DOI : 10.1109/TMM.2017.2777671
Copyright Disclaimer : © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Uncontrolled Keywords : Multi-person tracking; Spatial audio; Binaural microphones; Depth sensor; Depth and audio; PHD filtering
Depositing User : Clive Harris
Date Deposited : 15 Nov 2017 15:48
Last Modified : 10 Jul 2018 09:41

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800