Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation
Gilbert, Andrew, Trumble, Matthew, Malleson, Charles, Hilton, Adrian and Collomosse, John (2018) Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation International Journal of Computer Vision.
|
Text
Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation.pdf - Version of Record Available under License Creative Commons Attribution. Download (3MB) | Preview |
Abstract
We propose an approach to accurately esti- mate 3D human pose by fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data, without optical markers, a complex hardware setup or a full body model. Uniquely we use a multi-channel 3D convolutional neural network to learn a pose em- bedding from visual occupancy and semantic 2D pose estimates from the MVV in a discretised volumetric probabilistic visual hull (PVH). The learnt pose stream is concurrently processed with a forward kinematic solve of the IMU data and a temporal model (LSTM) exploits the rich spatial and temporal long range dependencies among the solved joints, the two streams are then fused in a final fully connected layer. The two complemen- tary data sources allow for ambiguities to be resolved within each sensor modality, yielding improved accu- racy over prior methods. Extensive evaluation is per- formed with state of the art performance reported on the popular Human 3.6M dataset [26], the newly re- leased TotalCapture dataset and a challenging set of outdoor videos TotalCaptureOutdoor. We release the new hybrid MVV dataset (TotalCapture) comprising of multi- viewpoint video, IMU and accurate 3D skele- tal joint ground truth derived from a commercial mo- tion capture system. The dataset is available online at http://cvssp.org/data/totalcapture/.
Item Type: | Article | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Divisions : | Faculty of Arts and Social Sciences > Department of Music and Media | ||||||||||||||||||
Authors : |
|
||||||||||||||||||
Date : | 8 September 2018 | ||||||||||||||||||
Funders : | Engineering and Physical Sciences Research Council (EPSRC) | ||||||||||||||||||
DOI : | 10.1007/s11263-018-1118-y | ||||||||||||||||||
Grant Title : | The Total Capture project | ||||||||||||||||||
Copyright Disclaimer : | This is a post-peer-review, pre-copyedit version of an article published in International Journal of Computer Vision. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11263-018-1118-y | ||||||||||||||||||
Uncontrolled Keywords : | 3D pose estimation; Sensor fusion; Deep neural networks; Multi viewpoint video; Inertial measurement units | ||||||||||||||||||
Depositing User : | Clive Harris | ||||||||||||||||||
Date Deposited : | 31 Aug 2018 09:16 | ||||||||||||||||||
Last Modified : | 05 Mar 2019 10:13 | ||||||||||||||||||
URI: | http://epubs.surrey.ac.uk/id/eprint/849168 |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year