2026-02-09
Independence is statistical independence - the outcome of one event does not affect our belief about the probability of another event
- If we draw a number from a hat, then flip a coin, the hat draw does not affect the value of the coin toss
- X does not affect Y - the outcome of X does not affect our belief about the probability of YIf X does affect Y
So…
All outcomes are equally likely
All outcomes are equally likely
- The probability of any frequency is 1/10
- Any deviation from a value of 10,000 is a random deviation from the expected value
All outcomes are equally likely
set.seed(123)
# plot a normal distribution with mean = 0, sd = 1, 100,000 random draws, 200 breaks, and red color
# add lines to the plot to illustrate the 68-95-99.7 rule
rand.norm<- rnorm(100000)
h <- hist(rand.norm, breaks = 200, freq = TRUE, main = "normal distribution, mean = 0, sd = 1, 100,000 random draws", xlab = 'x', col = "red")
abline(v = c(-1, 1), col = "blue", lwd = 2)
abline(v = c(-2, 2), col = "blue", lwd = 2)
abline(v = c(-3, 3), col = "blue", lwd = 2)
# create labels to indicate the percentage within each range
# get max y from the histogram for label placement
ymax <- max(h$counts)
# add labels for the 68-95-99.7 rule
text(1.1, ymax*0.7, "68%", pos = 4, col = "blue")
text(2.1, ymax*0.15, "95%", pos = 4, col = "blue")
text(3.1, ymax*0.05, "99.7%", pos = 4, col = "blue")
# add arrows to illustrate the range of values within each standard deviation range
## arrows from +sd line to −sd line at same y-level
arrows(x0 = 1, y0 = ymax*0.7, x1 = -1, y1 = ymax*0.7,
code = 3, angle = 15, length = 0.08, col = "blue") # 68% range
arrows(x0 = 2, y0 = ymax*0.15, x1 = -2, y1 = ymax*0.15,
code = 3, angle = 15, length = 0.08, col = "blue") # 95% range
arrows(x0 = 3, y0 = ymax*0.05, x1 = -3, y1 = ymax*0.05,
code = 3, angle = 15, length = 0.08, col = "blue") # 99.7% rangeWhat happens if we do the same thing above but do it 1,000 times and plot the counts?
The 68-95-99.7 rule
A statistic is a measure calculated from a sample of data
- e.g., sample mean, sample variance, sample standard deviationA parameter is a measure calculated from the entire population
- e.g., population mean, population variance, population standard deviationset.seed(123)
rand.poiss<- rpois(100000,1)
hp <- hist(rand.poiss, breaks = 200, freq = TRUE, main = "poisson distribution, lambda = 1, 100,000 draws", xlab = 'x', col = "red")
# add lines to illustrate the probabilities of 0, 1, 2, and 3 events occurring
ypmax <- max(hp$counts)
# labels on the right
text(1.1, ypmax*0.9, "74%", pos = 4, col = "blue")
text(2.1, ypmax*0.8, "92%", pos = 4, col = "blue")
text(3.1, ypmax*0.2, "98%", pos = 4, col = "blue")
# arrows spanning from 0 to each quantile line
arrows(x0 = 0.0, y0 = ypmax*0.9, x1 = 1, y1 = ypmax*0.9,
code = 3, angle = 15, length = 0.08, col = "blue")
arrows(x0 = 0.0, y0 = ypmax*0.5, x1 = 2, y1 = ypmax*0.5,
code = 3, angle = 15, length = 0.08, col = "blue")
arrows(x0 = 0.0, y0 = ypmax*0.2, x1 = 3, y1 = ypmax*0.2,
code = 3, angle = 15, length = 0.08, col = "blue")# Set a seed for reproducibility
set.seed(123)
# Generate data
poisson_data <- rpois(1000, lambda = 1)
normal_data <- rnorm(100000, mean = 1, sd = 1)
# Create histogram for Poisson (density)
h <- hist(poisson_data, probability = TRUE,
main = "Poisson(λ=1) vs. Normal(μ=1,σ=1)",
xlab = "Value", ylab = "Density", ylim = c(0, 0.4),
col = rgb(0.7, 0.9, 1, 0.7),
xlim = c(-1, 6)) # extend x to show negatives
# Overlay normal density
lines(density(normal_data), col = "red", lwd = 2)
# Legend
legend("topright", legend = c("Poisson", "Normal"),
col = c("lightblue", "red"), lty = 1, lwd = 2)
# Lines
abline(v = qnorm(0.975, 1, 1), col = "red", lwd = 2, lty = 2) # ~2.96
abline(v = qpois(0.98, 1), col = "blue", lwd = 2, lty = 2) # 3
# Labels
text(2.98, 0.35, "97.5%\nNormal", col = "red", pos = 4)
text(3.1, 0.28, "98%\nPoisson", col = "blue", pos = 4)
# Arrow pointing left from 97.5% line to negatives, with label
arrows(x0 = 2.8, y0 = 0.20, x1 = -0.3, y1 = 0.20,
code = 2, angle = 20, length = 0.1, col = "red", lwd = 1.5)
text(0.5, 0.22, "97.5% includes\nnegative values", col = "red", cex = 0.8)UH POLS3316, Spring 2026, Instructor: Tom Hanna