University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Continuous, speaker-independent, speech recognition for a speech to viseme translator.

Kelleher, Holly. (1999) Continuous, speaker-independent, speech recognition for a speech to viseme translator. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (5MB) | Preview


The work presented in this thesis forms part of a research project which attempts to generate a visualisation of a speaker's mouth from purely acoustic speech signals. The aim is to provide an aid for partially hearing impaired people in which visual information is presented alongside limited acoustic signals, facilitating easier use of the telephone. The system is essentially a low-level speech recogniser in which phonemic information is extracted from the speech waveform and mapped onto visemes generated on a synthetic facial image. This thesis presents a description of a major part of this project, that is, the development of an accurate phoneme discriminator which is capable of speaker independent operation, on continuous speech. The recognition process is realised in three stages: a pre-processor to convert the speech into a suitable parametric form; a pattern recogniser to identify the possible phoneme classes and a post-processor to produce the viseme information. The pattern recognition stage uses a self-organising Kohonen network, followed by a Learning Vector Quantiser (LVQ) to further improve the recognition accuracy. The performance of this stage is highly dependent on the choice of pre-processor used at the input to the network and it is the design of the pre-processor stage that forms a significant part of this work. A novel technique known as the pseudo-cepstrum forms the basis of this pre-processor. Extensive investigations have been conducted into the dependence of performance on a range of parameters, both at the pre-processor stage and within the Kohonen classifier. In particular, a performance comparison of several preprocessor techniques, including the pseudo-cepstrum, has been carried out. Factors affecting both the training and operation of the classifier are also described here, with the sensitivity of recognition performance to the input data, being a major issue. Overall recognition accuracies of 80% have been achieved.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Kelleher, Holly.
Date : 1999
Contributors :
Depositing User : EPrints Services
Date Deposited : 09 Nov 2017 12:18
Last Modified : 20 Jun 2018 11:53

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800