Alban Guillaumet, Troy University
Detecting deviations from Normality (I)
General options for handling violations of assumptions (II)
To check for normality, first (as always) look at your data. Histograms work best here.
The following data come from a normal distribution:
They don't look normal, but they:
Examples of data from non-normal distributions:
Definition: The
quantile of a measurement specifies the fraction of observations less than or equal to it. For instance, the first and third quartiles are the 0.25 and 0.75 quantiles, and the median is the 0.50 quantile.
Definition: The
normal quantile plot compares each observation in the sample with its quantile expected from the standard normal distribution. Points should fall roughly along a straight line if the data come from a normal distribution.
In R (note: axes are flipped by default!)
qqnorm(rnorm(100), datax = TRUE)
How this works ?
How this works ?
How this works ?
n <- 5; ( p <- (1:n)/(n+1) ); ( q <- qnorm(p, lower.tail = TRUE) )
[1] 0.1666667 0.3333333 0.5000000 0.6666667 0.8333333
[1] -0.9674216 -0.4307273 0.0000000 0.4307273 0.9674216
x <- sort(rnorm(100)) # 1. Sort measurements x
hist(x)
p <- (1:100)/101 # (2) Compute the estimated proportion of the distribution lying below an observation ranked i as i/(n+1)
q <- qnorm(p, lower.tail = TRUE) # (3) Compute the corresponding normal quantiles
plot(q ~ x, xlab="Sorted measurements", ylab="Normal quantiles"); abline(v = median(x), col = "red"); abline(h = 0, col = "blue") # (4) Plot measurements against computed quantiles (q vs x)
x <- sort(rexp(100)) # (1)
hist(x); abline(v = median(x), col = "red")
p <- (1:100)/101 # (2)
q <- qnorm(p, lower.tail = TRUE) # (3)
plot(q ~ x, xlab="Sorted measurements", ylab="Normal quantiles"); abline(v = median(x), col = "red"); abline(h = 0, col = "blue") # (4)
Normal distribution?
Question: Are marine reserves effective in preserving marine wildlife?
Research design
Halpern (2003) matched 32 marine reserves to a control location, which was either the site of the reserve before it became protected or a similar unprotected site nearby. They then evaluated the “biomass ratio,” which is the ratio of total masses of all marine plants and animals per unit area of reserve divided by the same quantity in unprotected areas.
Research design
Halpern (2003) matched 32 marine reserves to a control location, which was either the site of the reserve before it became protected or a similar unprotected site nearby. They then evaluated the “biomass ratio,” which is the ratio of total masses of all marine plants and animals per unit area of reserve divided by the same quantity in unprotected areas.
Discuss: Observational or experimental? Paired or unpaired? Interpret response measure in terms of effect of protection.
Discuss: Observational or experimental? Paired or unpaired? Interpret response measure in terms of effect of protection.
Answer: Observational. Paired (matching). Biomass ratio = 1 (no effect); > 1 (beneficial effect); < 1 (detrimental effect).
Definition: A
Shapiro-Wilk test evaluates the goodness of fit of a normal distribution to a set of data randomly sampled from a population.
\( H_{0} \): The data are sampled from a population having a normal distribution.
\( H_{A} \): The data are sampled from a population NOT having a normal distribution.
Caution:
par(cex.lab = 1.5)
marine <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv")
hist(marine$biomassRatio)
shapiro.test(marine$biomassRatio)
Shapiro-Wilk normality test
data: marine$biomassRatio
W = 0.81751, p-value = 8.851e-05