Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.
Download the R code on this page as a single file here
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
Probabilities under the normal curve:
pnorm(157.5, mean = 177.6, sd = 9.7)
Other new methods:
Normal approximation to a binomial distribution.
Calculate probabilities under the normal curve. The command pnorm(Y)
gives the probability of obtaining a value less than \(Y\) under the normal distribution. The arguments mean
and sd
give the mean and standard deviate of the desired normal distribution.
Pr[Height < 157.5]
pnorm(157.5, mean = 177.6, sd = 9.7)
## [1] 0.01912503
Pr[Height > 190.54]
1 - pnorm(190.54, mean = 177.6, sd = 9.7)
## [1] 0.09109855
Pr[Height < 157.5 or Height > 190.54]
pnorm(157.5, mean = 177.6, sd = 9.7) + 1 - pnorm(190.54, mean = 177.6, sd = 9.7)
## [1] 0.1102236
Demonstration of the central limit theorem, using the distribution of sample mean age at death in samples from a highly non-normal distribution: the frequency distribution of age at death in Switzerland in 1918 during the Spanish flu epidemic.
Read and inspect the data.
flu <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter10/chap10e6AgesAtDeathSpanishFlu1918.csv"))
head(flu)
## age
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
Histogram showing the frequency distribution of ages at death in Switzerland in 1918 during the Spanish flu epidemic.
hist(flu$age, right = FALSE)
Commands for a histogram with more options:
hist(flu$age, right = FALSE, breaks = seq(0,102,2), col = "firebrick", las = 1, xlab = "Age at death (yrs)", ylab = "Frequency", main = "")
Demonstrate the central limit theorem. Treat the age at death measurements from Switzerland in 1918 as the population. Take a large number of random samples, each of size \(n\), from the population of age at death measurements and plot the sample means. Note: your results won’t be the identical to the one in Figure 10.6-2, because 10,000 random samples is not large enough for extreme accuracy. Change \(n\) below to another number and rerun to see the effects of sample size on the shape of the distribution of sample means.
n <- 4
results <- vector()
for(i in 1:10000) {
AgeSample <- sample(flu$age, size = n, replace = FALSE)
results[i] <- mean(AgeSample)
}
Histogram of the sample means, with options.
hist(results, right = FALSE, breaks = 50, col = "firebrick", las = 1, xlab = "Mean age at death (yrs)", ylab = "Frequency", main = "")
*Normal approximation to the binomial distribution applied to the brown recluse spider example.
The \(P\)-value from the binomial test is \(P = 2 Pr[X \geq 31]\), which is the same as \(2 (1 - Pr[X < 30])\), since \(Pr[X \geq 31] = 1 - Pr[X < 30]\). We can use the normal approximation as follows. Remember that \(n = 41\) and \(p = 0.5\).
spiderProb <- 1 - pnorm( (30 + 1/2 - 41 * 0.5) / sqrt(41 * 0.50 * 0.5))
Pvalue <- 2 * spiderProb
Pvalue
## [1] 0.001787289
Compare with the result obtained when using the binomial distribution, dbinom
, which we encountered in the Chapter 7 R page.
2 * sum( dbinom(31:41, size = 41, prob = 0.5) )
## [1] 0.001450491
Or use pbinom
.
2 * (1 - pbinom(30, size = 41, prob = 0.5))
## [1] 0.001450491