Goal

In this tutorial, we’ll learn how to construct confidence intervals for a population mean when the population standard deviation is unknown, using the t-distribution.

Formula

\[ \text{CI} = \bar{x} \pm t^* \cdot \left( \frac{s}{\sqrt{n}} \right) \]

$\bar{x}$: Sample mean
$s$: Sample standard deviation
$n$: Sample size
$t^*$: Critical value from the t-distribution
Degrees of freedom $df = n - 1$

Example 1: Palmer Penguins

# Load dataset
library(palmerpenguins)
data(penguins)

# Clean data
penguins_clean <- na.omit(penguins)

# Summary statistics for flipper length
x_bar <- mean(penguins_clean$flipper_length_mm)
s <- sd(penguins_clean$flipper_length_mm)
n <- length(penguins_clean$flipper_length_mm)

# 95% Confidence Interval
alpha <- 0.05
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))

ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error

cat("95% CI for Flipper Length:", round(ci_lower, 2), "to", round(ci_upper, 2), "mm")

## 95% CI for Flipper Length: 199.46 to 202.48 mm

Example 2: Student Study Hours

# Sample data
study_hours <- c(10, 13, 14, 15, 12, 11, 17, 14, 13, 16, 18, 15)

# Compute stats
x_bar <- mean(study_hours)
s <- sd(study_hours)
n <- length(study_hours)

# 99% CI
alpha <- 0.01
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))

ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error

cat("99% CI for Study Hours:", round(ci_lower, 2), "to", round(ci_upper, 2), "hours")

## 99% CI for Study Hours: 11.87 to 16.13 hours

Example 3: Simulated Data

Simulate 100 values from a normal distribution with mean 50 and sd 10. Compute a 90% confidence interval for the mean.

set.seed(123)
sim_data <- rnorm(100, mean = 50, sd = 10)

x_bar <- mean(sim_data)
s <- sd(sim_data)
n <- length(sim_data)

alpha <- 0.10
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))

ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error

cat("90% CI for Simulated Data Mean:", round(ci_lower, 2), "to", round(ci_upper, 2))

## 90% CI for Simulated Data Mean: 49.39 to 52.42

Summary

Use the t-distribution when σ is unknown.
Use qt() to find the critical value.
CI helps us understand the range where the true mean is likely to fall.

Part 4: Apply Your Skills

Question1: Built in data

Use mtcars$mpg to construct a 95% confidence interval for average miles per gallon.
Using the mtcars$mpg dataset, calculate both a 90% and a 99% confidence interval. How does the interval change?

# Your code here:

Question2: Daily Screen Time Analysis

Q1. Data Collection (Given) Suppose you collected screen time data (in hours per day) from 20 classmates, friends, or family members. Here is the collected data:

screen_time <- c(6.2, 7.1, 5.8, 6.5, 7.9, 6.3, 5.5, 7.4, 6.0, 6.7,
                 6.8, 7.2, 5.9, 6.1, 7.5, 6.6, 6.4, 7.0, 6.9, 5.7)

Q2. Compute Summary Statistics Use R to compute the following for the dataset above:

-Sample mean

-Sample standard deviation

-Sample size

Q3. Construct a 95% Confidence Interval Use the t-distribution to construct a 95% confidence interval for the average daily screen time. Show all R code and report the lower and upper bounds of the confidence interval.

Q4. Interpret Your Interval

Q5. What If…? If your sample had twice as many people (n = 40) but a similar standard deviation, how would you expect the width of the confidence interval to change?

Q7. Potential Sources of Bias Think critically about the data collection process. List at least two potential sources of bias that could affect the accuracy of your results.

# Your code here:

Confidence Interval Tutorial in R

Yulei Pang

2025-04-15