University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Implicit models for automatic pose estimation in static images

Holt, Brian D. (2015) Implicit models for automatic pose estimation in static images Doctoral thesis, University of Surrey.

brian_holt_phd_thesis.pdf - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (8MB) | Preview
[img] Text
2014_08_13_Author_Deposit_Agreement.docx - Other
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (23kB)


Automatic human pose estimation is one of the major topics in computer vision. This is a challenging problem, with applications to gaming, human computer interaction, markerless motion capture, video analysis, action and gesture recognition. This thesis addresses the problem of automatically estimating the two dimensional articulated pose of a human in static range images. Implicit models of pose are trained to efficiently predict body part locations of humans in static images based on easily computed depth features. While most prior work has focused on pose estimation in RGB images, range data is used as the basis for this approach because it provides additional information and invariances that can be leveraged to improve estimation accuracy. Three main contributions are each described in their own chapter. The first contribution proposes a novel method to estimate articulated pose by detecting poselets and accumulating predictions from the detections. A basic assumption throughout part-based pose estimation literature is that a `part' should correspond closely to an anatomical subdivision of the body such as `hand' or `forearm', but this is not necessarily the most salient feature for visual recognition. If the part corresponds to a highly deformable anatomical part it becomes even more difficult to detect reliably, making it susceptible to high levels of false positive detections. By contrast, a description such as `half a frontal face and shoulder' or `legs in a scissor shape' may be far easier to detect reliably. The concept of a poselet, defined as a set of parts that are `tightly clustered in configuration space and appearance space' is employed as the representation, and detectors are trained on poselets extracted from the dataset. Meta-data such as the direction and distance from each poselet to each landmark is stored in a database. At test time the method works by applying a multiscale scanning window over the image, and trained poselet detectors activate and predict offset meta-data into Hough accumulator images of the landmark locations. Furthermore, by employing an inference step using the natural hierarchy of the body, limb estimation is improved. The second contribution of this thesis is to cast the pose estimation task as a continuous non-linear regression problem. It is demonstrated that this problem can be effectively addressed by Random Regression Forests. This approach differs from a part-based classification approach in that there are no part detectors at any scale. Instead, the approach is more direct, with binary comparison features computed efficiently on each pixel which are used to vote for body parts. The votes are accumulated in Hough accumulator images and the most likely hypothesis is taken as the peak in a winner-takes-all approach. A new dataset of aligned range and RGB data with annotations of 25,000 images over 12 subjects is contributed. The final chapter of this thesis describes a novel conditional regression model based on poselet detectors. A second contribution of this chapter is the development of a geodesic based method that, combined with estimates of rigid parts, delivers significantly higher predictive accuracy on deformable parts. Intuitively, deformable parts such as the hands correspond to geodesic extrema which can be found using geodesic distances, leading to a further improvement in the accuracy of the model. A geodesic mesh is constructed from the underlying range data and labels are assigned to geodesic extrema. The method proposed exploits the complementary characteristics of rigid and deformable parts resulting in a significant improvement in the predictive accuracy of the limbs.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Holt, Brian
Date : 27 February 2015
Contributors :
ContributionNameEmailORCID, R
Uncontrolled Keywords : Pose Estimation, Random Decision Forests, Implicit Models, Poselets, Generalised Hough Transform, Range images, Kinect, RGB-D
Depositing User : Brian Holt
Date Deposited : 09 Mar 2015 10:43
Last Modified : 31 Oct 2017 17:21

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800