University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Model selection on probability density estimation using Gaussian mixtures.

Sardo, Lucia. (1997) Model selection on probability density estimation using Gaussian mixtures. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (21MB) | Preview


This thesis proposes Gaussian Mixtures as a flexible semiparametric tool for density estimation and addresses the problem of model selection for this class of density estimators. First, a brief introduction to various techniques for model selection proposed in literature is given. The most commonly used techniques are cross validation nad methods based on data reuse and they all are either computationally very intensive or extremely demanding in terms of training set size. Another class of methods known as information criteria allows model selection at a much lower computational cost and for any sample size. The main objective of this study is to develop a technique for model selection that is not too computationally demanding, while capable of delivering an acceptable performance on a range of problems of various dimensionality. Another important issue addressed is the effect of the sample size. Large data sets are often difficult and costly to obtain, hence keeping the sample size within reasonable limits is also very important. Nevertheless sample size is central to the problem of density estimation and one cannot expect good results with extremely limited samples. Information Criteria are the most suitable candidates for a model selection procedure fulfilling these requirements. The well-known criterion Schwarz's Bayesian Information Criterion (BIC) has been analysed and its deficiencies when used with data of large dimensionality data are noted. A modification that improves on BIC criterion is proposed and named Maximum Penalised Likelihood (MPL) criterion. This criterion has the advantage that it can adapted to the data and its satisfactory performance is demonstrated experimentally. Unfortunately all information criteria, including the proposed MPL, suffer from a major drawback: a strong assumption of simplicity of the density to be estimated. This can lead to badly underfitted estimates, especially for small sample size problems. As a solution to such deficiencies, a procedure for validating the different models, based on an assessment of the model predictive performance, is proposed. The optimality criterion for model selection can be formulated as follow; if a model is able to predict the observed data frequencies within the statistical error, it is an acceptable model, otherwise it is rejected. An attractive feature of such a measure of goodness is the fact that it is an absolute measure, rather than a relative one, which would only provide a ranking between candidated models.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Sardo, Lucia.
Date : 1997
Contributors :
Depositing User : EPrints Services
Date Deposited : 09 Nov 2017 12:11
Last Modified : 16 Mar 2018 18:03

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800