University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Using deep neural networks to estimate tongue movements from speech face motion

Kroos, Christian, Bundgaard-Nielsen, RL, Best, CT and Plumbley, Mark (2017) Using deep neural networks to estimate tongue movements from speech face motion In: 14th International Conference on Auditory-Visual Speech Processing (AVSP2017), 25 - 26 August 2017, Stockholm,Sweden.

[img]
Preview
Text
kroos_et_al_AVSP2017_revised.pdf - Accepted version Manuscript

Download (211kB) | Preview

Abstract

This study concludes a tripartite investigation into the indirect visibility of the moving tongue in human speech as reflected in co-occurring changes of the facial surface. We were in particular interested in how the shared information is distributed over the range of contributing frequencies. In the current study we examine the degree to which tongue movements during speech can be reliably estimated from face motion using artificial neural networks. We simultaneously acquired data for both movement types; tongue movements were measured with Electromagnetic Articulography (EMA), face motion with a passive marker-based motion capture system. A multiresolution analysis using wavelets provided the desired decomposition into frequency subbands. In the two earlier studies of the project we established linear and non-linear relations between lingual and facial speech motions, as predicted and compatible with previous research in auditory-visual speech. The results of the current study using a Deep Neural Network (DNN) for prediction show that a substantive amount of variance can be recovered (between 13.9 and 33.2% dependent on the speaker and tongue sensor location). Importantly, however, the recovered variance values and the root mean squared error values of the Euclidean distances between the measured and the predicted tongue trajectories are in the range of the linear estimations of our earlier study.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
NameEmailORCID
Kroos, Christianc.kroos@surrey.ac.ukUNSPECIFIED
Bundgaard-Nielsen, RLUNSPECIFIEDUNSPECIFIED
Best, CTUNSPECIFIEDUNSPECIFIED
Plumbley, Markm.plumbley@surrey.ac.ukUNSPECIFIED
Date : 25 August 2017
Funders : EPRSC
Uncontrolled Keywords : Face motion, tongue movements, deep neural networks, speech articulation, multiresolution analysis, wavelets, electromagnetic articulography
Depositing User : Melanie Hughes
Date Deposited : 27 Jun 2017 14:38
Last Modified : 27 Jun 2017 14:38
URI: http://epubs.surrey.ac.uk/id/eprint/841496

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800