What is a t-test?

Statistical test to compare means between groups
Helps determine if differences between groups are statistically significant
Common types:
- One-sample t-test: Compare a sample mean to a known value
- Independent two-sample t-test: Compare means of two independent groups
- Paired t-test: Compare means of paired observations

Understanding Key Terms

Null Hypothesis (H₀): No difference between means
Alternative Hypothesis (H₁): There is a difference between means
p-value: Probability of observing such differences by chance
Significance level (α): 0.05 for small datasets (<150 observations), this should be smaller for large datasets (i.e. .01, .001, and smaller as the data has more observations)
Confidence Interval: Range of plausible values for true difference

One-Sample t-test Example

Let’s test if the average MPG of cars in mtcars differs from 20 MPG

# Perform one-sample t-test
t_test_result <- t.test(mtcars$mpg, mu = 20)
t_test_result

## 
##  One Sample t-test
## 
## data:  mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  17.91768 22.26357
## sample estimates:
## mean of x 
##  20.09062

Interpretation:

Mean MPG: 20.09
p-value: 0.9547 (not significant)
We cannot conclude that the true mean MPG differs from 20

Visualizing One-Sample t-test

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
  geom_vline(xintercept = 20, color = "red", linetype = "dashed") +
  annotate("text", x = 21, y = 8, label = "H₀: μ = 20") +
  labs(title = "Distribution of MPG",
       x = "Miles per Gallon",
       y = "Count")

Independent Two-Sample t-test

Let’s compare MPG between automatic and manual transmission cars

# Convert am to factor
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

# Perform two-sample t-test
t_test_trans <- t.test(mpg ~ am, data = mtcars)
t_test_trans

## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Visualizing Two-Sample t-test

ggplot(mtcars, aes(x = am, y = mpg, fill = am)) +
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "red") +
  labs(title = "MPG by Transmission Type",
       x = "Transmission",
       y = "Miles per Gallon") +
  theme_minimal()

Interpreting Two-Sample t-test Results

For transmission type comparison:

Mean difference: 7.24 MPG
p-value: 0.000374 (significant)
95% CI: [3.64, 10.85]
Interpretation:
- Manual transmission cars have significantly higher MPG
- We’re 95% confident the true difference is between 3.64 and 10.85 MPG
- Very small p-value suggests this difference is not due to chance

Paired t-test Example

Let’s simulate before/after data for a fuel efficiency modification:

# Simulate paired data
set.seed(123)
before <- mtcars$mpg
after <- before + rnorm(32, mean = 2, sd = 1)
paired_data <- data.frame(before, after)

# Perform paired t-test
paired_test <- t.test(after, before, paired = TRUE)
paired_test

## 
##  Paired t-test
## 
## data:  after and before
## t = 11.626, df = 31, p-value = 7.811e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.616110 2.303783
## sample estimates:
## mean difference 
##        1.959946

Visualizing Paired t-test

paired_long <- data.frame(
  mpg = c(before, after),
  time = rep(c("Before", "After"), each = length(before))
)

ggplot(paired_long, aes(x = time, y = mpg, fill = time)) +
  geom_boxplot() +
  geom_line(aes(group = rep(1:length(before), 2)), alpha = 0.2) +
  labs(title = "MPG Before and After Modification",
       x = "Time",
       y = "Miles per Gallon") +
  theme_minimal()

t-test Assumptions

Normality
- Data should be approximately normally distributed
- Can check using QQ plots
Equal Variances (for independent t-test)
- Can use Levene’s test
- Alternative: Welch’s t-test (default in R)

par(mfrow = c(1, 2))
qqnorm(mtcars$mpg[mtcars$am == "Automatic"], main = "QQ Plot: Automatic")
qqline(mtcars$mpg[mtcars$am == "Automatic"])
qqnorm(mtcars$mpg[mtcars$am == "Manual"], main = "QQ Plot: Manual")
qqline(mtcars$mpg[mtcars$am == "Manual"])

Common Mistakes to Avoid

Using t-test when assumptions are violated
- Consider non-parametric alternatives (e.g., Wilcoxon test)
Multiple testing without correction
- Increases risk of false positives
- Use Bonferroni or other corrections
Confusing statistical and practical significance
- Small p-value ≠ Important difference
- Consider effect size and practical implications

Effect Size (Cohen’s d)

# Calculate Cohen's d for transmission comparison
library(effsize)
cohen.d(mpg ~ am, data = mtcars)

## 
## Cohen's d
## 
## d estimate: -1.477947 (large)
## 95 percent confidence interval:
##     lower     upper 
## -2.304209 -0.651685