In this tutorial, we’ll learn how to construct confidence intervals for a population mean when the population standard deviation is unknown, using the t-distribution.
\[ \text{CI} = \bar{x} \pm t^* \cdot \left( \frac{s}{\sqrt{n}} \right) \]
# Load dataset
library(palmerpenguins)
data(penguins)
# Clean data
penguins_clean <- na.omit(penguins)
# Summary statistics for flipper length
x_bar <- mean(penguins_clean$flipper_length_mm)
s <- sd(penguins_clean$flipper_length_mm)
n <- length(penguins_clean$flipper_length_mm)
# 95% Confidence Interval
alpha <- 0.05
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))
ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error
cat("95% CI for Flipper Length:", round(ci_lower, 2), "to", round(ci_upper, 2), "mm")
## 95% CI for Flipper Length: 199.46 to 202.48 mm
# Sample data
study_hours <- c(10, 13, 14, 15, 12, 11, 17, 14, 13, 16, 18, 15)
# Compute stats
x_bar <- mean(study_hours)
s <- sd(study_hours)
n <- length(study_hours)
# 99% CI
alpha <- 0.01
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))
ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error
cat("99% CI for Study Hours:", round(ci_lower, 2), "to", round(ci_upper, 2), "hours")
## 99% CI for Study Hours: 11.87 to 16.13 hours
Simulate 100 values from a normal distribution with mean 50 and sd 10. Compute a 90% confidence interval for the mean.
set.seed(123)
sim_data <- rnorm(100, mean = 50, sd = 10)
x_bar <- mean(sim_data)
s <- sd(sim_data)
n <- length(sim_data)
alpha <- 0.10
t_star <- qt(1 - alpha/2, df = n - 1)
margin_error <- t_star * (s / sqrt(n))
ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error
cat("90% CI for Simulated Data Mean:", round(ci_lower, 2), "to", round(ci_upper, 2))
## 90% CI for Simulated Data Mean: 49.39 to 52.42
qt()
to find the critical value.mtcars$mpg
to construct a 95% confidence interval
for average miles per gallon.# Your code here:
Q1. Data Collection (Given) Suppose you collected screen time data (in hours per day) from 20 classmates, friends, or family members. Here is the collected data:
screen_time <- c(6.2, 7.1, 5.8, 6.5, 7.9, 6.3, 5.5, 7.4, 6.0, 6.7,
6.8, 7.2, 5.9, 6.1, 7.5, 6.6, 6.4, 7.0, 6.9, 5.7)
Q2. Compute Summary Statistics Use R to compute the following for the dataset above:
-Sample mean
-Sample standard deviation
-Sample size
Q3. Construct a 95% Confidence Interval Use the t-distribution to construct a 95% confidence interval for the average daily screen time. Show all R code and report the lower and upper bounds of the confidence interval.
Q4. Interpret Your Interval
Q5. What If…? If your sample had twice as many people (n = 40) but a similar standard deviation, how would you expect the width of the confidence interval to change?
Q7. Potential Sources of Bias Think critically about the data collection process. List at least two potential sources of bias that could affect the accuracy of your results.
# Your code here: