University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Model selection on probability density estimation using Gaussian mixtures.

Sardo, Lucia. (1997) Model selection on probability density estimation using Gaussian mixtures. Doctoral thesis, University of Surrey (United Kingdom)..

Full text is not currently available. Please contact sriopenaccess@surrey.ac.uk, should you require it.

Abstract

This thesis proposes Gaussian Mixtures as a flexible semiparametric tool for density estimation and addresses the problem of model selection for this class of density estimators. First, a brief introduction to various techniques for model selection proposed in literature is given. The most commonly used techniques are cross validation nad methods based on data reuse and they all are either computationally very intensive or extremely demanding in terms of training set size. Another class of methods known as information criteria allows model selection at a much lower computational cost and for any sample size. The main objective of this study is to develop a technique for model selection that is not too computationally demanding, while capable of delivering an acceptable performance on a range of problems of various dimensionality. Another important issue addressed is the effect of the sample size. Large data sets are often difficult and costly to obtain, hence keeping the sample size within reasonable limits is also very important. Nevertheless sample size is central to the problem of density estimation and one cannot expect good results with extremely limited samples. Information Criteria are the most suitable candidates for a model selection procedure fulfilling these requirements. The well-known criterion Schwarz's Bayesian Information Criterion (BIC) has been analysed and its deficiencies when used with data of large dimensionality data are noted. A modification that improves on BIC criterion is proposed and named Maximum Penalised Likelihood (MPL) criterion. This criterion has the advantage that it can adapted to the data and its satisfactory performance is demonstrated experimentally. Unfortunately all information criteria, including the proposed MPL, suffer from a major drawback: a strong assumption of simplicity of the density to be estimated. This can lead to badly underfitted estimates, especially for small sample size problems. As a solution to such deficiencies, a procedure for validating the different models, based on an assessment of the model predictive performance, is proposed. The optimality criterion for model selection can be formulated as follow; if a model is able to predict the observed data frequencies within the statistical error, it is an acceptable model, otherwise it is rejected. An attractive feature of such a measure of goodness is the fact that it is an absolute measure, rather than a relative one, which would only provide a ranking between candidated models.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
NameEmailORCID
Sardo, Lucia.UNSPECIFIEDUNSPECIFIED
Date : 1997
Contributors :
ContributionNameEmailORCID
http://www.loc.gov/loc.terms/relators/THSUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Depositing User : EPrints Services
Date Deposited : 09 Nov 2017 12:11
Last Modified : 09 Nov 2017 14:39
URI: http://epubs.surrey.ac.uk/id/eprint/842833

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800