- Confidence Intervals give a range of plausible values for a population parameter
- The 95% confidence interval is most commonly used in statistics
- Understanding CIs help us interpret statistical results more precisely.
The confidence interval for a population mean is constructed using:
\[CI = \bar{X} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]
Where:
For a 95% confidence level: \(Z_{0.025} = 1.96\)
In order to calculate the confidence interval, we use the Z-score. As in a normal distribution, a Z score calculated with a specific value represents the amount of the population above or below the value used.
\(\frac{\bar{X} - \mu}{S / \sqrt{n}}\)
Z = ±1.96 captures 95% of the normal distribution
What a 95% CI means: If we repeated our sampling process many times, 95% of the confidence intervals created should contain the true population parameter
Common misconceptions:
It does not mean there’s a 95% probability the true value is in this specific range. The interval will vary across samples.
We must assume normality and a large enough sample size, which is a limitation
n_vals <- seq(10, 200, by = 10)
z_vals <- c(1.645, 1.96, 2.576) # 90%, 95%, 99%
sigma <- 2.5
grid_data <- expand.grid(n = n_vals, z = z_vals)
grid_data$ci_width <- 2 * grid_data$z * (sigma / sqrt(grid_data$n))
ci_matrix <- matrix(grid_data$ci_width,
nrow = length(n_vals),
ncol = length(z_vals))
plot_ly(x = n_vals, y = z_vals, z = ci_matrix,
type = "surface") %>%
layout(scene = list(
xaxis = list(title = "Sample Size"),
yaxis = list(title = "Confidence Level"),
zaxis = list(title = "CI Width")))