University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Speech Driven Face Synthesis from 3D Video

Ypsilos, IA, Hilton, A, Turkmani, A and Jackson, PJB (2004) Speech Driven Face Synthesis from 3D Video In: 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004-09-06 - 2004-09-09, Thessaloniki, Greece.

[img]
Preview
PDF
YpsilosEtAl_IS3DPVT04.pdf - Accepted Version
Available under License : See the attached licence file.

Download (560Kb)
[img] Plain Text (licence)
licence.txt

Download (1516b)

Abstract

We present a framework for speech-driven synthesis of real faces from a corpus of 3D video of a person speaking. Video-rate capture of dynamic 3D face shape and colour appearance provides the basis for a visual speech synthesis model. A displacement map representation combines face shape and colour into a 3D video. This representation is used to efficiently register and integrate shape and colour information captured from multiple views. To allow visual speech synthesis viseme primitives are identified from the corpus using automatic speech recognition. A novel nonrigid alignment algorithm is introduced to estimate dense correspondence between 3D face shape and appearance for different visemes. The registered displacement map representation together with a novel optical flow optimisation using both shape and colour, enables accurate and efficient nonrigid alignment. Face synthesis from speech is performed by concatenation of the corresponding viseme sequence using the nonrigid correspondence to reproduce both 3D face shape and colour appearance. Concatenative synthesis reproduces both viseme timing and co-articulation. Face capture and synthesis has been performed for a database of 51 people. Results demonstrate synthesis of 3D visual speech animation with a quality comparable to the captured video of a person.

Item Type: Conference or Workshop Item (Paper)
Additional Information:

Copyright 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Divisions: Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Depositing User: Symplectic Elements
Date Deposited: 31 Jan 2012 16:39
Last Modified: 23 Sep 2013 18:51
URI: http://epubs.surrey.ac.uk/id/eprint/7754

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800