Visual Speech Synthesis from 3D Video
Edge, J and Hilton, A (2006) Visual Speech Synthesis from 3D Video In: 3rd European Conference on Visual Media Production, 2006., 2006-11-29 - 2006-11-30, London.
Available under License : See the attached licence file.
In this paper we describe a parameterisation of lip movements which maintains the dynamic structure inherent in the task of producing speech sounds. A stereo capture system is used to reconstruct 3D models of a speaker producing sentences from the TIMIT corpus. This data is mapped into a space which maintains the relationships between samples and their temporal derivatives. By incorporating dynamic information within the parameterisation of lip movements we can model the cyclical structure, as well as the causal nature of speech movements as described by an underlying visual speech manifold. It is believed that such a structure will be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements.
|Item Type:||Conference or Workshop Item (Paper)|
Copyright 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
|Divisions:||Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing|
|Depositing User:||Symplectic Elements|
|Date Deposited:||05 Oct 2012 09:56|
|Last Modified:||09 Jun 2014 13:21|
Actions (login required)
Downloads per month over past year