In this presentation, I explore the concept of confidence intervals with examples.

Introduction to Confidence Intervals

  • Confidence intervals are a range of values, derived from sample statistics, that are likely to contain the true population parameter. They help us quantify the uncertainty of our estimates, providing a margin of error for our statistical inferences. In this presentation, we will explore confidence intervals using both 2D and 3D visualizations, as well as mathematical analysis.

What is a Confidence Interval?

  • A confidence interval is constructed to estimate the population parameter with a given level of confidence. For example, a 95% confidence interval means that if we were to take 100 different samples and build a confidence interval for each of them, approximately 95 of those intervals would contain the true population parameter. This makes confidence intervals crucial for hypothesis testing and statistical inference.

Confidence Interval Formula

Confidence intervals can be calculated using the following formula:

\[ CI = \bar{X} \pm Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]

Where:

  • \(\bar{X}\) is the sample mean.
  • \(Z_{\alpha/2}\) is the critical value from the Z-distribution corresponding to the desired confidence level.
  • \(\sigma\) is the population standard deviation.
  • \(n\) is the sample size.

Sample Mean and Confidence Interval Calculation

The sample mean and the standard error are used to estimate the confidence interval as follows:

\[ \text{Sample Mean} = \frac{1}{n} \sum_{i=1}^n X_i \]

\[ \text{Standard Error} = \frac{s}{\sqrt{n}} \]

Where \(s\) is the sample standard deviation, calculated as:

\[ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2} \]

Calculating the CI for Sample Mean

  • This slide shows the calculated confidence interval for a random sample of 100 data points, where the data was generated to have a mean of 170 and a standard deviation of 10.
  • The Sample Mean represents the average height of the individuals in the sample, and we calculate the 95% Confidence Interval for this mean. This interval provides a range in which we can expect the true population mean to lie, with a 95% level of confidence.
  • The lower bound (CI_Lower) and upper bound (CI_Upper) of this confidence interval are shown below:
95% Confidence Interval for Sample Mean
Sample_Mean CI_Lower CI_Upper
170.9 169.11 172.69

New

  • The following R code was used to create the histogram on the previous slide showing the sampling distribution along with the 95% confidence interval for the sample mean.

ggplot(data.frame(data), aes(x = data)) + geom_histogram(aes(y = ..density..), bins = 20, color = “black”, fill = “lightblue”) + geom_vline(xintercept = sample_mean, color = “blue”, linetype = “dashed”, size = 1) + geom_vline(xintercept = ci_lower, color = “red”, linetype = “dashed”, size = 0.8) + geom_vline(xintercept = ci_upper, color = “red”, linetype = “dashed”, size = 0.8) + labs(title = “Sampling Distribution with 95% Confidence Interval”, x = “Sample Data”, y = “Density”)