M. Drew LaMar
October 5, 2020
“Standard normal deviate: Not to be confused with 'everyday ordinary pervert.' You don't often find a jargon term that seems to be both redundant and self-contradictory.”
- Whitlock & Schluter
Fisher’s exact test: (2 x 2 tables only) Examines the independence of two categorical variables, even with small expected values
\( G \)-test: (any table) Derived from principles of likelihood.
Pro : Great with complicated experimental designs with multiple explanatory variables.
Con : Can be less accurate for small sample sizes.
Situation: You've found a statistically significant association between two categorical variables using the \( \chi^2 \) contingency test.
Question: Where is the association? In other words, in which levels of the categories is the association present and how large is the association?
We need to estimate the magnitude of the association, which the \( P \)-value does not give us!
We can estimate odds ratios or relative risks for 2 \( \times \) 2 sub-tables within the contingency table by either subsetting or collapsing.
Uninfected Lightly Highly
Eaten 1 10 37
Not eaten 49 35 9
Uninfected Highly
Eaten 1 37
Not eaten 49 9
oddsratio(parTable[,c(1,3)], method = "wald")
$data
Uninfected Highly Total
Eaten 1 37 38
Not eaten 49 9 58
Total 50 46 96
$measure
NA
odds ratio with 95% C.I. estimate lower upper
Eaten 1.000000000 NA NA
Not eaten 0.004964148 0.0006020703 0.04093004
$p.value
NA
two-sided midp.exact fisher.exact chi.square
Eaten NA NA NA
Not eaten 1.110223e-16 6.861412e-17 4.140762e-15
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Definition: The
normal distribution is a continuous probability distribution describing a bell-shaped curve. It is a good approximation to the frequency distributions of many biological variables.
The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.
The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.
x <- seq(from=-2, to=12, length.out=1000)
y <- dnorm(x, mean=5, sd=2)
plot(x, y, type="l", cex.axis=1.5, cex.lab=1.5)
Name | R command | Uses |
---|---|---|
dnorm(x, mean, sd) |
- | |
CDF | pnorm(q, mean, sd, lower.tail=TRUE) |
- |
CCDF | pnorm(q, mean, sd, lower.tail=FALSE) |
Compute \( P \)-values |
QF | qnorm(p, mean, sd, lower.tail=TRUE) |
- |
CQF | qnorm(p, mean, sd, lower.tail=FALSE) |
Compute critical values |
Defaults: mean = 0
and sd = 1
(standard normal deviate)
Discuss: With \( \mu=\sigma=2 \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?
\[ Y \sim N(\mu,\sigma^2) = N(2,4) \]
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?
(prob <- pnorm(4,
mean=2,
sd=2,
lower.tail=FALSE))
[1] 0.1586553
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y < 4\mathrm{]} \)?
(prob <- pnorm(4,
mean=2,
sd=2,
lower.tail=TRUE))
[1] 0.8413447
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \mathrm{Pr[} 2 < Y < 4\mathrm{]} = \mathrm{Pr[} Y > 2\mathrm{]} - \mathrm{Pr[} Y > 4\mathrm{]} \]
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
(prob <-
pnorm(2,
mean=2, sd=2,
lower.tail=FALSE) -
pnorm(4,
mean=2, sd=2,
lower.tail=FALSE))
[1] 0.3413447
\[ \begin{eqnarray*} f(Y) & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}} \\ & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{Y-\mu}{\sigma}\right)^2} \end{eqnarray*} \]
Letting \( Z = \frac{Y-\mu}{\sigma} \), we have
\[ f(Z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}Z^2}. \]
The mean of \( Z \) is zero and the standard deviation of \( Z \) is one.
Definition: The
standard normal deviate
\[ Z = \frac{Y-\mu}{\sigma} \] tells us how many standard deviations \( \sigma \) a particular \( Y \) value is from the mean \( \mu \).
Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr}\left[ \frac{2-2}{2} < Z < \frac{4-2}{2}\right] \\ & = & \mathrm{Pr[}0 < Z < 1\mathrm{]} \\ & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \end{eqnarray*} \]
Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \\ & = & 0.5 - 0.1587 \\ & = & 0.3413 \end{eqnarray*} \]
Theorem: If a variable \( Y \) has a normal distribution in a population, then the distribution of sample means \( \bar{Y} \) is also normal.
Theorem: \( Y \sim N(\mu,\sigma^2) \Rightarrow \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}^2) \), where \( \sigma_{\bar{Y}} \) is the
standard error of the mean given by
\[ \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}. \]
Central Limit Theorem: According to the
central limit theorem , the sum or mean of a large number of measurements randomly sampled from a non-normal population is approximately normally distributed.
Discuss: Why does the normal distribution show up so often in many apparently unrelated fields of study?
Definition: The normal distribution arises naturally from the combination of a large number of independent random events or factors.
“A typical example is a person's height, which is determined by a combination of many independent factors, both genetic and environmental. Each of these factors may tend to increase or decrease a person's height,just as a ball in Galton's board may bounce to the right or the left at each level. As Galton's board shows, when you combine many chance factors, the resulting distribution is binomial. By the Central Limit Theorem, when the number of independent factors is very large, the binomial distribution is approximated by a normal curve.”
Paul Trow (http://ptrow.com/articles/Galton_June_07.htm)