Ophthalmic statistics note 9: parametric versus nonparametric methods for data analysis
Skene, Simon, Bunce, C, Freemantle, N and Dore, C J (2016) Ophthalmic statistics note 9: parametric versus nonparametric methods for data analysis BMJ Open Ophthalmology, 100 (7). pp877878.
Full text not available from this repository.Abstract
INTRODUCTION/SCENARIO Distributions of measured data are often well modelled by known probability distributions, which provide a useful description of their underlying properties such as location (average), spread (variation) and shape. Statisticians use probability distributions to interpret and attribute meaning, draw conclusions and answer research questions using the measurements or data that researchers gather during their studies. Different types of data follow different probability distributions, and these distributions are characterised by certain features called parameters. Even the most statistically averse of researchers is likely to have heard of the normal distribution, which is often used to approximate the distribution of continuous or measurement data such as intraocular pressure, central retinal thickness and degree of proptosis. The normal distribution follows a ‘bellshaped curve’ (although with a rim stretching to ±infinity) the shape of which is specified by the mean and SD, with different values of each, giving rise to different bellshaped curves (see figure 1). Other distributions such as the binomial and Poisson probability distributions are less commonly reported in ophthalmic research and are characterised by different parameters. The binomial distribution is used for dichotomous data and is characterised by the probability of success, that is, the number of ‘successes’ out of a total number of observed events, for example, the proportion of graft transplants that fail within 6 months of transplantation. The Poisson distribution is used for counts data and is characterised by the mean number of events, for example, endophthalmitis rates. The assumption that the observed data follow such probability distributions allows a statistician to apply appropriate statistical tests, which are known as parametric tests. The normal distribution is a powerful tool provided the data plausibly arise from that distribution or can be made to reasonably approximate this following a suitable transformation such as by taking natural logarithms to reduce asymmetry. The normal distribution also serves as an approximating distribution to the Poisson or binomial distribution under certain circumstances or can be used for large samples to approximate the distribution of the sample mean via the central limit theorem. Tests based on the normal distribution are therefore extremely useful and form the basis of many analyses, the usual tests being z tests or t tests, which rely on approximate normality or normality, respectively.1 If we can assume a normal distribution, then we expect 95% of values to lie within 1.96 SDs of the mean. Parametric tests make assumptions about the distribution of the data and sometimes it may be impossible to assess these assumptions, perhaps because the sample size is small or because that data do not follow any of the more common probability distributions. Alternatively, we may be interested in making inferences about medians rather than means or about ordinal or ranked data. In such circumstances, statisticians may adopt an alternative class of statistical tests, which are known as nonparametric or distributionfree methods. These methods work by ranking the data in numerical order and analysing these ranks rather than the actual measurements observed. Two of the most wellknown nonparametric methods are the Mann–Whitney test (or U test) and the Wilcoxon matchedpairs signedrank test, which are suitable for data from two unpaired samples or two paired samples, respectively.2 3 The Wilcoxon matchedpairs signedrank test calculates the differences between each matched pair in the two samples and replaces their absolute values with their ordered ranks (1, 2, 3, etc), ignoring zeros. Under the null hypothesis of no difference between samples, the sum of the positive and negative ranks should be similar. The test statistic is usually taken to be the smaller of the two sums, and exact p values can be found using statistical software or by comparison with statistical tables. The Mann–Whitney U test effectively considers all pairs of observations from two independent samples and calculates the number of pairs for which an observation in one sample is preceded by an observation from the other. Again, the U statistic can be calculated from the summed ranks within each sample, found by ordering the pooled observations. Such tests depend only on the rank ordering of the observed values and not on any assumptions about their underlying distributions, so that there are no associated parameters to be estimated, and in that sense such methods are considered nonparametric or distributionfree. These are easily implemented in standard statistical software packages such as R, Stata, SAS or SPSS.
Item Type:  Article  

Divisions : 
Faculty of Health and Medical Sciences > Surrey Clinical Research Centre Faculty of Health and Medical Sciences 

Authors : 


Date :  July 2016  
DOI :  10.1136/bjophthalmol2015308252  
Copyright Disclaimer :  Copyright 2016 the authors  
Related URLs :  
Depositing User :  Diane Maxfield  
Date Deposited :  23 Apr 2018 15:32  
Last Modified :  16 Jan 2019 19:09  
URI:  http://epubs.surrey.ac.uk/id/eprint/846289 
Actions (login required)
View Item 
Downloads
Downloads per month over past year