University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation

Gilbert, Andrew, Trumble, Matthew, Malleson, Charles, Hilton, Adrian and Collomosse, John (2018) Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation International Journal of Computer Vision.

Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation.pdf - Version of Record
Available under License Creative Commons Attribution.

Download (3MB) | Preview


We propose an approach to accurately esti- mate 3D human pose by fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data, without optical markers, a complex hardware setup or a full body model. Uniquely we use a multi-channel 3D convolutional neural network to learn a pose em- bedding from visual occupancy and semantic 2D pose estimates from the MVV in a discretised volumetric probabilistic visual hull (PVH). The learnt pose stream is concurrently processed with a forward kinematic solve of the IMU data and a temporal model (LSTM) exploits the rich spatial and temporal long range dependencies among the solved joints, the two streams are then fused in a final fully connected layer. The two complemen- tary data sources allow for ambiguities to be resolved within each sensor modality, yielding improved accu- racy over prior methods. Extensive evaluation is per- formed with state of the art performance reported on the popular Human 3.6M dataset [26], the newly re- leased TotalCapture dataset and a challenging set of outdoor videos TotalCaptureOutdoor. We release the new hybrid MVV dataset (TotalCapture) comprising of multi- viewpoint video, IMU and accurate 3D skele- tal joint ground truth derived from a commercial mo- tion capture system. The dataset is available online at

Item Type: Article
Divisions : Faculty of Arts and Social Sciences > Department of Music and Media
Authors :
Date : 8 September 2018
Funders : Engineering and Physical Sciences Research Council (EPSRC)
DOI : 10.1007/s11263-018-1118-y
Grant Title : The Total Capture project
Copyright Disclaimer : This is a post-peer-review, pre-copyedit version of an article published in International Journal of Computer Vision. The final authenticated version is available online at:
Uncontrolled Keywords : 3D pose estimation; Sensor fusion; Deep neural networks; Multi viewpoint video; Inertial measurement units
Depositing User : Clive Harris
Date Deposited : 31 Aug 2018 09:16
Last Modified : 05 Mar 2019 10:13

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800