University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Robust text-independent speaker recognition over telecommunications systems.

Uzuner, Halil. (2006) Robust text-independent speaker recognition over telecommunications systems. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (29MB) | Preview


Biometric recognition methods, using human features such as voice, face or fingeorprints, are increasingly popular for user authentication. Voice is unique in that it is a non-intrusive biometric which can be transmitted over the existing telecommunication networks, thereby allowing remote authentication. Current spealcer recognition systems can provide high recognition rates on clean speech signals. However, their performance has been shown to degrade in real-life applications such as telephone banking, where speech compression and background noise can affect the speech signal. In this work, three important advancements have been introduced to improve the speaker recognition performance, where it is affected by the coder mismatch, the aliasing distortion caused by the Line Spectral Frequency (LSF) parameter extraction, and the background noise. The first advancement focuses on investigating the speaker recognition system performance in a multi-coder environment using a Speech Coder Detection (SCD) System, which minimises training and testing data mismatch and improves the speaker recognition performance. Having reduced the speaker recognition error rates for multi-coder environment, further investigation on GSM-EFR speech coder is performed to deal with a particular - problem related to LSF parameter extraction method. It has been previously shown that the classic technique for extraction of LSF parameters in speech coders is prone to aliasing distortion. Low-pass filtering on up-sampled LSF vectors has been shown to alleviate this problem, therefore improving speech quality. In this thesis, as a second advancement, the Non-Aliased LSF (NA-LSF) extraction method is introduced in order to reduce the unwanted effects of GSM-EFR coder on speaker recognition performance. Another important factor that effects the performance of speaker recognition systems is the presence of the background noise. Background noise might severely reduce the performance of the targeted application such as quality of the coded speech, or the performance of the speaker recognition systems. The third advancement was achieved by using a noise-canceller to improve the speaker recognition performance in mismatched environments with varying background noise conditions. Speaker recognition system with a Minimum Mean Square Error - Log Spectral Amplitudes (MMSE-LSA) noise- canceller used as a pre-processor is proposed and investigated to determine the efficiency of noise cancellation on the speaker recognition performance using speech corrupted by different background noise conditions. Also the effects of noise cancellation on speaker recognition performance using coded noisy speech have been investigated. Key words; Identification, Verification, Recognition, Gaussian Mixture Models, Speech Coding, Noise Cancellation.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Uzuner, Halil.
Date : 2006
Contributors :
Depositing User : EPrints Services
Date Deposited : 09 Nov 2017 12:14
Last Modified : 15 Mar 2018 21:05

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800