M. Drew LaMar
October 29, 2021
“Standard normal deviate: Not to be confused with 'everyday ordinary pervert.' You don't often find a jargon term that seems to be both redundant and self-contradictory.”
- Whitlock & Schluter
Definition: The
normal distribution is a continuous probability distribution describing a bell-shaped curve. It is a good approximation to the frequency distributions of many biological variables.
The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.
The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.
x <- seq(from=-2, to=12, length.out=1000)
y <- dnorm(x, mean=5, sd=2)
plot(x, y, type="l", cex.axis=1.5, cex.lab=1.5)
Name | R command | Uses |
---|---|---|
dnorm(x, mean, sd) |
- | |
CDF | pnorm(q, mean, sd, lower.tail=TRUE) |
- |
CCDF | pnorm(q, mean, sd, lower.tail=FALSE) |
Compute \( P \)-values |
QF | qnorm(p, mean, sd, lower.tail=TRUE) |
- |
CQF | qnorm(p, mean, sd, lower.tail=FALSE) |
Compute critical values |
Defaults: mean = 0
and sd = 1
(standard normal deviate)
Discuss: With \( \mu=\sigma=2 \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?
\[ Y \sim N(\mu,\sigma^2) = N(2,4) \]
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?
(prob <- pnorm(4,
mean=2,
sd=2,
lower.tail=FALSE))
[1] 0.1586553
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y < 4\mathrm{]} \)?
(prob <- pnorm(4,
mean=2,
sd=2,
lower.tail=TRUE))
[1] 0.8413447
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \mathrm{Pr[} 2 < Y < 4\mathrm{]} = \mathrm{Pr[} Y > 2\mathrm{]} - \mathrm{Pr[} Y > 4\mathrm{]} \]
Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
(prob <-
pnorm(2,
mean=2, sd=2,
lower.tail=FALSE) -
pnorm(4,
mean=2, sd=2,
lower.tail=FALSE))
[1] 0.3413447
\[ \begin{eqnarray*} f(Y) & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}} \\ & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{Y-\mu}{\sigma}\right)^2} \end{eqnarray*} \]
Letting \( Z = \frac{Y-\mu}{\sigma} \), we have
\[ f(Z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}Z^2}. \]
The mean of \( Z \) is zero and the standard deviation of \( Z \) is one.
Definition: The
standard normal deviate
\[ Z = \frac{Y-\mu}{\sigma} \] tells us how many standard deviations \( \sigma \) a particular \( Y \) value is from the mean \( \mu \).
Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr}\left[ \frac{2-2}{2} < Z < \frac{4-2}{2}\right] \\ & = & \mathrm{Pr[}0 < Z < 1\mathrm{]} \\ & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \end{eqnarray*} \]
Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?
\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \\ & = & 0.5 - 0.1587 \\ & = & 0.3413 \end{eqnarray*} \]
Theorem: If a variable \( Y \) has a normal distribution in a population, then the distribution of sample means \( \bar{Y} \) is also normal.
Theorem: \( Y \sim N(\mu,\sigma^2) \Rightarrow \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}^2) \), where \( \sigma_{\bar{Y}} \) is the
standard error of the mean given by
\[ \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}. \]
Central Limit Theorem: According to the
central limit theorem , the sum or mean of a large number of measurements randomly sampled from a non-normal population is approximately normally distributed.
Discuss: Why does the normal distribution show up so often in many apparently unrelated fields of study?
Definition: The normal distribution arises naturally from the combination of a large number of independent random events or factors.
“A typical example is a person's height, which is determined by a combination of many independent factors, both genetic and environmental. Each of these factors may tend to increase or decrease a person's height,just as a ball in Galton's board may bounce to the right or the left at each level. As Galton's board shows, when you combine many chance factors, the resulting distribution is binomial. By the Central Limit Theorem, when the number of independent factors is very large, the binomial distribution is approximated by a normal curve.”
Paul Trow (http://ptrow.com/articles/Galton_June_07.htm)