University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Real time hand pose estimation for human computer interaction.

Krejov, Philip G. (2016) Real time hand pose estimation for human computer interaction. Doctoral thesis, University of Surrey.

thesis.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (28MB) | Preview
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (165kB) | Preview


The aim of this thesis is to address the challenge of real-time pose estimation of the hand. Specifically this thesis aims to determine the joint positions of a non-augmented hand. This thesis focuses on the use of depth, performing localisation of the parts of the hand for efficient fitting of a kinematic model and consists of four main contributions. The first contribution presents an approach to Multi-touch(less) tracking, where the objective is to track the fingertips with a high degree of accuracy without sensor contact. Using a graph based approach, the surface of the hand is modelled and extrema of the hand are located. From this, gestures are identified and used for interaction. We briefly discuss one use case for this technology in the context of the Making Sense demonstrator inspired by the film ”The Minority Report”. This demonstration system allows an operator to quickly summarise and explore complex multi-modal multimedia data. The tracking approach allows for collaborative interactions due to its highly efficient tracking, resolving 4 hands simultaneously in real-time. The second contribution applies a Randomised Decision Forest (RDF) to the problem of pose estimation and presents a technique to identify regions of the hand, using features that sample depth. The RDF is an ensemble based classifier that is capable of generalising to unseen data and is capable of modelling expansive datasets, learning from over 70,000 pose examples. The approach is also demonstrated in the challenging application of American Sign Language (ASL) fingerspelling recognition. The third contribution combines a machine learning approach with a model based method to overcome the limitations of either technique in isolation. A RDF provides initial segmentation allowing surface constraints to be derived for a 3D model, which is subsequently fitted to the segmentation. This stage of global optimisation incorporates temporal information and enforces kinematic constraints. Using Rigid Body Dynamics for optimisation, invalid poses due to self-intersection and segmentation noise are resolved. Accuracy of the approach is limited by the natural variance between users and the use of a generic hand model. The final contribution therefore proposes an approach to refine pose via cascaded linear regression which samples the residual error between the depth and the model. This combination of techniques is demonstrated to provide state of the art accuracy in real time, without the use of a GPU and without the requirement for model initialisation.

Item Type: Thesis (Doctoral)
Subjects : Computer Vision, Hand Pose Estimation, Machine Learning
Divisions : Theses
Authors :
Krejov, Philip G.krejov100@msn.com0000-0002-1359-4202
Date : 29 February 2016
Funders : EPSRC
Contributors :
Depositing User : Philip Krejov
Date Deposited : 01 Mar 2016 10:07
Last Modified : 31 Oct 2017 18:03

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800