University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Koller, Oscar, Camgöz, Necati Cihan, Ney, Hermann and Bowden, Richard (2019) Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos IEEE Transactions on Pattern Analysis and Machine Intelligence.

koller2019pami.pdf - Accepted version Manuscript

Download (5MB) | Preview


In this work we present a new approach to the field of weakly supervised learning in the video domain. Our method is relevant to sequence learning problems which can be split up into sub-problems that occur in parallel. Here, we experiment with sign language data. The approach exploits sequence constraints within each independent stream and combines them by explicitly imposing synchronisation points to make use of parallelism that all sub-problems share. We do this with multi-stream HMMs while adding intermediate synchronisation constraints among the streams. We embed powerful CNN-LSTM models in each HMM stream following the hybrid approach. This allows the discovery of attributes which on their own lack sufficient discriminative power to be identified. We apply the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers. We evaluate the classifiers on three publicly available benchmark data sets featuring challenging real-life sign language with over 1000 classes, full sentence based lip-reading and articulated hand shape recognition on a fine-grained hand shape taxonomy featuring over 60 different hand shapes. We clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Koller, Oscar
Camgöz, Necati
Ney, Hermann
Date : 15 April 2019
Funders : Engineering and Physical Sciences Research Council (EPSRC)
DOI : 10.1109/TPAMI.2019.2911077
Grant Title : ExTOL
Copyright Disclaimer : © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Weakly supervised learning; Hybrid CNN-LSTM-HMMs; Continuous sign language recognition; Lip reading; Hand shape recognition; Hidden Markov models; Assistive technology; Gesture recognition; Synchronization; Shape; Supervised learning; Speech recognition
Depositing User : Clive Harris
Date Deposited : 09 May 2019 13:23
Last Modified : 10 May 2019 11:44

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800