The Student’s T-Test is a statistical powerhouse used to determine if the difference between two group means is “real” or just due to chance. Developed by William Sealy Gosset under the pen name “Student,” it is essential for small sample sizes where the population standard deviation is unknown.
To test if a sample mean (\(\bar{x}\)) differs from a theoretical population mean (\(\mu\)): \[t = \frac{\bar{x} - \mu}{s / \sqrt{n}}\]
Used to compare two unrelated groups (\(A\) and \(B\)): \[t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}\]
Used for the same subjects measured twice. It is essentially a one-sample t-test on the differences (\(d\)): \[t = \frac{\bar{d}}{s_d / \sqrt{n}}\]
Scenario: A factory produces protein bars labeled 50g. An inspector samples 15 bars to ensure they aren’t under-filling.
set.seed(42)
energy_bars <- rnorm(15, mean = 48.8, sd = 1.5)
# One-Sample T-Test
t_test_one <- t.test(energy_bars, mu = 50)
print(t_test_one)##
## One Sample t-test
##
## data: energy_bars
## t = -1.1928, df = 14, p-value = 0.2528
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 48.67471 50.37799
## sample estimates:
## mean of x
## 49.52635
ggplot(data.frame(weight = energy_bars), aes(x = weight)) +
geom_density(fill = "skyblue", alpha = 0.5) +
geom_vline(xintercept = 50, color = "red", linetype = "dashed", size = 1) +
theme_minimal() +
labs(title = "Energy Bar Weight Verification", x = "Weight (g)")Figure 1: Sample distribution vs. Target Weight
Scenario: Testing if a new drug lowers blood pressure more effectively than a placebo.
control <- rnorm(30, mean = 140, sd = 10)
treatment <- rnorm(30, mean = 132, sd = 10)
bp_data <- data.frame(
Group = rep(c("Control", "Treatment"), each = 30),
BP = c(control, treatment)
)
# Independent T-Test (Formula method works here)
t_test_ind <- t.test(BP ~ Group, data = bp_data)
print(t_test_ind)##
## Welch Two Sample t-test
##
## data: BP by Group
## t = 1.0688, df = 55.548, p-value = 0.2898
## alternative hypothesis: true difference in means between group Control and group Treatment is not equal to 0
## 95 percent confidence interval:
## -2.566797 8.436714
## sample estimates:
## mean in group Control mean in group Treatment
## 136.5544 133.6194
ggplot(bp_data, aes(x = Group, y = BP, fill = Group)) +
geom_boxplot(alpha = 0.7) +
stat_summary(fun = mean, geom = "point", color = "red", size = 3) +
theme_light()Figure 2: Blood Pressure comparison between groups
Scenario: Students take a math test Before and After a specific training bootcamp. We are testing the improvement within the same students.
Note: To avoid the formula method
error, we pass the two time-points as separate vectors.
# Data Generation
before <- c(65, 70, 68, 72, 55, 60, 80, 75, 66, 70)
after <- c(72, 75, 70, 78, 58, 65, 82, 80, 70, 74)
# CORRECTED: Using vector method for Paired T-test
t_test_paired <- t.test(after, before, paired = TRUE)
print(t_test_paired)##
## Paired t-test
##
## data: after and before
## t = 8.3096, df = 9, p-value = 1.632e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 3.129396 5.470604
## sample estimates:
## mean difference
## 4.3
# Data for plotting
paired_df <- data.frame(
ID = rep(1:10, 2),
Time = factor(rep(c("Before", "After"), each = 10), levels = c("Before", "After")),
Score = c(before, after)
)ggplot(paired_df, aes(x = Time, y = Score, group = ID)) +
geom_line(color = "gray") +
geom_point(aes(color = Time), size = 3) +
theme_minimal() +
labs(title = "Pre- vs Post-Bootcamp Scores")Figure 3: Individual Student Progress (Paired Data)
| Test Type | Input Format in R | Purpose |
|---|---|---|
| One-Sample | t.test(x, mu = value) |
Compare sample to a standard. |
| Independent | t.test(y ~ x) |
Compare two different groups. |
| Paired | t.test(vec1, vec2, paired = TRUE) |
Compare same group at two times. |
The T-test is a robust tool for hypothesis testing. By understanding whether your data is independent or paired, and selecting the correct R syntax, you can accurately determine the significance of your experimental results. ```
t.test(after, before, paired = TRUE).
This bypasses the formula limitation and is the standard way to run
paired tests in R.levels = c("Before", "After") to ensure the
plot shows “Before” first (chronological order) rather than alphabetical
order.