University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled

Koller, O, Ney, H and Bowden, R (2016) Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled In: CVPR 2016: IEEE Conference on Computer Vision and Pattern Recognition, 2016-06-26 - 2016-07-01, Las Vegas, NV, USA.

[img]
Preview
Text
handshapes-cvpr-koller-final.pdf - Accepted version Manuscript
Available under License : See the attached licence file.

Download (2MB) | Preview
[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf
Available under License : See the attached licence file.

Download (33kB) | Preview

Abstract

This work presents a new approach to learning a framebased classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. This allows the CNN to be trained on a vast number of example images when only loose sequence level information is available for the source videos. Although we demonstrate this in the context of hand shape recognition, the approach has wider application to any video recognition task where frame level labelling is not available. The iterative EM algorithm leverages the discriminative ability of the CNN to iteratively refine the frame level annotation and subsequent training of the CNN. By embedding the classifier within an EM framework the CNN can easily be trained on 1 million hand images. We demonstrate that the final classifier generalises over both individuals and data sets. The algorithm is evaluated on over 3000 manually labelled hand shape images of 60 different classes which will be released to the community. Furthermore, we demonstrate its use in continuous sign language recognition on two publicly available large sign language data sets, where it outperforms the current state-of-the-art by a large margin. To our knowledge no previous work has explored expectation maximization without Gaussian mixture models to exploit weak sequence labels for sign language recognition.

Item Type: Conference or Workshop Item (Conference Paper)
Subjects : Electronic Engineering
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Koller, OUNSPECIFIEDUNSPECIFIED
Ney, HUNSPECIFIEDUNSPECIFIED
Bowden, RUNSPECIFIEDUNSPECIFIED
Date : 2016
Copyright Disclaimer : © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Depositing User : Symplectic Elements
Date Deposited : 07 Nov 2016 11:26
Last Modified : 07 Nov 2016 11:26
URI: http://epubs.surrey.ac.uk/id/eprint/812764

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800