Introduction

  • Confidence Intervals give a range of plausible values for a population parameter
  • The 95% confidence interval is most commonly used in statistics
  • Understanding CIs help us interpret statistical results more precisely.

Mathematical Foundation

The confidence interval for a population mean is constructed using:

\[CI = \bar{X} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]

Where:

  • \(\bar{X}\) = sample mean
  • \(Z_{\alpha/2}\) = critical value from standard normal distribution
  • \(\sigma\) = population standard deviation (or \(S\) for sample SD)
  • \(n\) = sample size

For a 95% confidence level: \(Z_{0.025} = 1.96\)

Method and Definition

  • In order to calculate the confidence interval, we use the Z-score. As in a normal distribution, a Z score calculated with a specific value represents the amount of the population above or below the value used.

  • \(\frac{\bar{X} - \mu}{S / \sqrt{n}}\)

  • Z = ±1.96 captures 95% of the normal distribution

Explanation and Deficiencies

  • What a 95% CI means: If we repeated our sampling process many times, 95% of the confidence intervals created should contain the true population parameter

  • Common misconceptions:

  • It does not mean there’s a 95% probability the true value is in this specific range. The interval will vary across samples.

  • We must assume normality and a large enough sample size, which is a limitation

Random Distribution

  • This models the standard height distribution (in inches) of Women in the United States

Population Distribution

Sample Size Effect on CI Width

3D: CI Across Sample Sizes

Code: Creating the 3D Plot

n_vals <- seq(10, 200, by = 10)
z_vals <- c(1.645, 1.96, 2.576)  # 90%, 95%, 99%
sigma <- 2.5

grid_data <- expand.grid(n = n_vals, z = z_vals)
grid_data$ci_width <- 2 * grid_data$z * (sigma / sqrt(grid_data$n))

ci_matrix <- matrix(grid_data$ci_width, 
                    nrow = length(n_vals), 
                    ncol = length(z_vals))

plot_ly(x = n_vals, y = z_vals, z = ci_matrix, 
        type = "surface") %>%
  layout(scene = list(
    xaxis = list(title = "Sample Size"),
    yaxis = list(title = "Confidence Level"),
    zaxis = list(title = "CI Width")))