University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Neural Sign Language Translation

Camgöz, Necati Cihan, Hadfield, Simon, Koller, O, Ney, H and Bowden, Richard (2018) Neural Sign Language Translation In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 18 - 22 June 2018, Salt Lake City, Utah, USA.

[img]
Preview
Text
camgoz2018cvpr (002).pdf - Accepted version Manuscript

Download (2MB) | Preview
[img] Text
camgoz2018cvpr (002).pdf - Accepted version Manuscript
Restricted to Repository staff only

Download (2MB)

Abstract

Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar. We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language. To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTHPHOENIX- Weather 2014T1. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with >67K signs from a sign vocabulary of >1K and >99K words from a German vocabulary of >2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Camgöz, Necati Cihann.camgoz@surrey.ac.uk
Hadfield, Simons.hadfield@surrey.ac.uk
Koller, O
Ney, H
Bowden, RichardR.Bowden@surrey.ac.uk
Date : 2018
Copyright Disclaimer : © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Depositing User : Melanie Hughes
Date Deposited : 01 May 2018 11:43
Last Modified : 24 Jul 2018 10:25
URI: http://epubs.surrey.ac.uk/id/eprint/846335

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800