University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks.

Stoll, Stephanie, Camgöz, Necati Cihan, Hadfield, Simon and Bowden, Richard (2020) Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks. International Journal of Computer Vision.

[img] Text
IJCV_BMVC.pdf - Accepted version Manuscript
Restricted to Repository staff only until 3 January 2021.
Available under License Creative Commons Attribution.

Download (8MB)

Abstract

We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary to current approaches that are dependent on heavily annotated data, our approach requires minimal gloss and skeletal level annotations for training. We achieve this by breaking down the task into dedicated sub-processes. We first translate spoken language sentences into sign pose sequences by combining an NMT network with a Motion Graph. The resulting pose information is then used to condition a generative model that produces photo realistic sign language video sequences. This is the first approach to continuous sign video generation that does not use a classical graphical avatar. We evaluate the translation abilities of our approach on the PHOENIX14T Sign Language Translation dataset. We set a baseline for text-to-gloss translation, reporting a BLEU-4 score of 16.34/15.26 on dev/test sets. We further demonstrate the video generation capabilities of our approach for both multi-signer and high-definition settings qualitatively and quantitatively using broadcast quality assessment metrics.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
NameEmailORCID
Stoll, Stephanies.m.stoll@surrey.ac.uk
Camgöz, Necati Cihann.camgoz@surrey.ac.uk
Hadfield, Simons.hadfield@surrey.ac.uk
Bowden, RichardR.Bowden@surrey.ac.uk
Date : 2 January 2020
Funders : SNSF Sinergia project, European Union’s Horizon 2020 research and innovation programme, EPSRC - Engineering and Physical Sciences Research Council, NVIDIA Corporation
DOI : 10.1007/s11263-019-01281-2
Copyright Disclaimer : © The Author(s) 2019. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Uncontrolled Keywords : Generative adversarial networks; Neural machine translation; Sign language production
Depositing User : Diane Maxfield
Date Deposited : 24 Jan 2020 12:48
Last Modified : 05 Feb 2020 10:07
URI: http://epubs.surrey.ac.uk/id/eprint/853393

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800