The normal distribution

M. Drew LaMar
October 29, 2021

Standard normal deviate: Not to be confused with 'everyday ordinary pervert.' You don't often find a jargon term that seems to be both redundant and self-contradictory.”

- Whitlock & Schluter

The normal distribution

Definition: The normal distribution is a continuous probability distribution describing a bell-shaped curve. It is a good approximation to the frequency distributions of many biological variables.

The normal distribution - Equation

The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.

The normal distribution - Equation

The probability density function \( f(Y) \) for a random normal variable is given by \[ f(Y) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}}, \] where \( \mu \) and \( \sigma \) are mean and standard deviation of \( Y \), respectively.

The normal distribution - In R

x <- seq(from=-2, to=12, length.out=1000)
y <- dnorm(x, mean=5, sd=2)
plot(x, y, type="l", cex.axis=1.5, cex.lab=1.5)

plot of chunk unnamed-chunk-2

Summary of functions for normal dist.

Name R command Uses
PDF dnorm(x, mean, sd) -
CDF pnorm(q, mean, sd, lower.tail=TRUE) -
CCDF pnorm(q, mean, sd, lower.tail=FALSE) Compute \( P \)-values
QF qnorm(p, mean, sd, lower.tail=TRUE) -
CQF qnorm(p, mean, sd, lower.tail=FALSE) Compute critical values

Defaults: mean = 0 and sd = 1 (standard normal deviate)

Discuss: With \( \mu=\sigma=2 \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?

\[ Y \sim N(\mu,\sigma^2) = N(2,4) \]

Computing probs - Greater than

Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y > 4\mathrm{]} \)?

plot of chunk unnamed-chunk-3

(prob <- pnorm(4, 
               mean=2, 
               sd=2, 
               lower.tail=FALSE))
[1] 0.1586553

Computing probs - Less than

Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} Y < 4\mathrm{]} \)?

plot of chunk unnamed-chunk-5

(prob <- pnorm(4, 
               mean=2, 
               sd=2, 
               lower.tail=TRUE))
[1] 0.8413447

Computing probs - Between

Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?

\[ \mathrm{Pr[} 2 < Y < 4\mathrm{]} = \mathrm{Pr[} Y > 2\mathrm{]} - \mathrm{Pr[} Y > 4\mathrm{]} \]

Computing probs - Between

Question: With \( Y \sim N(2,4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?

plot of chunk unnamed-chunk-7

(prob <- 
   pnorm(2, 
         mean=2, sd=2, 
         lower.tail=FALSE) - 
   pnorm(4, 
         mean=2, sd=2, 
         lower.tail=FALSE))
[1] 0.3413447

Standard normal deviates

\[ \begin{eqnarray*} f(Y) & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(Y-\mu)^2}{2\sigma^2}} \\ & = & \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{Y-\mu}{\sigma}\right)^2} \end{eqnarray*} \]

Letting \( Z = \frac{Y-\mu}{\sigma} \), we have

\[ f(Z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}Z^2}. \]

The mean of \( Z \) is zero and the standard deviation of \( Z \) is one.

Standard normal deviates

Definition: The standard normal deviate

\[ Z = \frac{Y-\mu}{\sigma} \] tells us how many standard deviations \( \sigma \) a particular \( Y \) value is from the mean \( \mu \).

Standard normal tables

Standard normal tables

Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?

\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr}\left[ \frac{2-2}{2} < Z < \frac{4-2}{2}\right] \\ & = & \mathrm{Pr[}0 < Z < 1\mathrm{]} \\ & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \end{eqnarray*} \]

Standard normal tables

Question: With \( Y \sim N(\mu=2,\sigma^2=4) \), what is \( \mathrm{Pr[} 2 < Y < 4\mathrm{]} \)?

\[ \begin{eqnarray*} \mathrm{Pr[} 2 < Y < 4\mathrm{]} & = & \mathrm{Pr[}Z > 0\mathrm{]} - \mathrm{Pr[}Z > 1\mathrm{]} \\ & = & 0.5 - 0.1587 \\ & = & 0.3413 \end{eqnarray*} \]

Normal distribution of sample means

Theorem: If a variable \( Y \) has a normal distribution in a population, then the distribution of sample means \( \bar{Y} \) is also normal.

Theorem: \( Y \sim N(\mu,\sigma^2) \Rightarrow \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}^2) \), where \( \sigma_{\bar{Y}} \) is the standard error of the mean given by

\[ \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}. \]

Normal distribution of sample means

alt text

alt text

Central limit theorem

Central Limit Theorem: According to the central limit theorem, the sum or mean of a large number of measurements randomly sampled from a non-normal population is approximately normally distributed.

http://www.zoology.ubc.ca/~whitlock/kingfisher/CLT.htm

Normal approx. to binomial dist.

Discuss: Why does the normal distribution show up so often in many apparently unrelated fields of study?

Definition: The normal distribution arises naturally from the combination of a large number of independent random events or factors.

Normal approx. to binomial dist.

alt text

alt text

Normal approx. to binomial dist.

Normal approx. to binomial dist.

  • Flip a coin at each pin: heads go right, tails go left
  • Number of heads chooses positive slope “lanes”
  • Can overlay Pascal's triangle to get number of paths
  • Running machine includes probabilities of following those paths
  • Thus, we get a binomial distribution!

Normal approx. to binomial dist.

“A typical example is a person's height, which is determined by a combination of many independent factors, both genetic and environmental. Each of these factors may tend to increase or decrease a person's height,just as a ball in Galton's board may bounce to the right or the left at each level. As Galton's board shows, when you combine many chance factors, the resulting distribution is binomial. By the Central Limit Theorem, when the number of independent factors is very large, the binomial distribution is approximated by a normal curve.”

Paul Trow (http://ptrow.com/articles/Galton_June_07.htm)