Comparing Visual Features for Lipreading
Lan, Y, Harvey, R, Theobald, B, Ong, EJ and Bowden, R (2009) Comparing Visual Features for Lipreading In: AVSP 2009, 2009-09-10 - 2009-09-13, Norwich, UK.
Available under License : See the attached licence file.
For automatic lipreading, there are many competing methods for feature extraction. Often, because of the complexity of the task these methods are tested on only quite restricted datasets, such as the letters of the alphabet or digits, and from only a few speakers. In this paper we compare some of the leading methods for lip feature extraction and compare them on the GRID dataset which uses a constrained vocabulary over, in this case, 15 speakers. Previously the GRID data has had restricted attention because of the requirements to track the face and lips accurately. We overcome this via the use of a novel linear predictor (LP) tracker which we use to control an Active Appearance Model (AAM). By ignoring shape and/or appearance parameters from the AAM we can quantify the effect of appearance and/or shape when lip-reading. We find that shape alone is a useful cue for lipreading (which is consistent with human experiments). However, the incremental effect of shape on appearance appears to be not significant which implies that the inner appearance of the mouth contains more information than the shape.
|Item Type:||Conference or Workshop Item (Conference Paper)|
|Divisions :||Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing|
|Depositing User :||Symplectic Elements|
|Date Deposited :||18 Jun 2012 08:41|
|Last Modified :||09 Jun 2014 13:18|
Actions (login required)
Downloads per month over past year