University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Mean-Shift and Sparse Sampling Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking

Kilic, V, Barnard, M, Wang, W, Hilton, A and Kittler, J (2016) Mean-Shift and Sparse Sampling Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking IEEE Transactions on Multimedia.

[img]
Preview
Text
wang_double_2016.pdf - Accepted version Manuscript
Available under License : See the attached licence file.

Download (11MB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

The probability hypothesis density (PHD) filter based on sequential Monte Carlo (SMC) approximation (also known as SMC-PHD filter) has proven to be a promising algorithm for multi-speaker tracking. However, it has a heavy computational cost as surviving, spawned and born particles need to be distributed in each frame to model the state of the speakers and to estimate jointly the variable number of speakers with their states. In particular, the computational cost is mostly caused by the born particles as they need to be propagated over the entire image in every frame to detect the new speaker presence in the view of the visual tracker. In this paper, we propose to use audio data to improve the visual SMC-PHD (VSMC- PHD) filter by using the direction of arrival (DOA) angles of the audio sources to determine when to propagate the born particles and re-allocate the surviving and spawned particles. The tracking accuracy of the AV-SMC-PHD algorithm is further improved by using a modified mean-shift algorithm to search and climb density gradients iteratively to find the peak of the probability distribution, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. These improved algorithms, named as AVMS-SMCPHD and sparse-AVMS-SMC-PHD respectively, are compared systematically with AV-SMC-PHD and V-SMC-PHD based on the AV16.3, AMI and CLEAR datasets.

Item Type: Article
Subjects : Electronic Engineering
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Kilic, VUNSPECIFIEDUNSPECIFIED
Barnard, MUNSPECIFIEDUNSPECIFIED
Wang, WUNSPECIFIEDUNSPECIFIED
Hilton, AUNSPECIFIEDUNSPECIFIED
Kittler, JUNSPECIFIEDUNSPECIFIED
Date : 2016
Funders : EPSRC
Copyright Disclaimer : (c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Related URLs :
Depositing User : Symplectic Elements
Date Deposited : 10 Aug 2016 11:02
Last Modified : 10 Aug 2016 11:02
URI: http://epubs.surrey.ac.uk/id/eprint/811658

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800