1. Introduction to the T-Test

The Student’s T-Test is a statistical powerhouse used to determine if the difference between two group means is “real” or just due to chance. Developed by William Sealy Gosset under the pen name “Student,” it is essential for small sample sizes where the population standard deviation is unknown.


2. Mathematical Foundations

2.1 The One-Sample T-Test Formula

To test if a sample mean (\(\bar{x}\)) differs from a theoretical population mean (\(\mu\)): \[t = \frac{\bar{x} - \mu}{s / \sqrt{n}}\]

2.2 The Independent Two-Sample T-Test

Used to compare two unrelated groups (\(A\) and \(B\)): \[t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}\]

2.3 The Paired T-Test

Used for the same subjects measured twice. It is essentially a one-sample t-test on the differences (\(d\)): \[t = \frac{\bar{d}}{s_d / \sqrt{n}}\]


3. Real-Life Example 1: Quality Control (One-Sample)

Scenario: A factory produces protein bars labeled 50g. An inspector samples 15 bars to ensure they aren’t under-filling.

set.seed(42)
energy_bars <- rnorm(15, mean = 48.8, sd = 1.5)

# One-Sample T-Test
t_test_one <- t.test(energy_bars, mu = 50)
print(t_test_one)
## 
##  One Sample t-test
## 
## data:  energy_bars
## t = -1.1928, df = 14, p-value = 0.2528
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
##  48.67471 50.37799
## sample estimates:
## mean of x 
##  49.52635
ggplot(data.frame(weight = energy_bars), aes(x = weight)) +
  geom_density(fill = "skyblue", alpha = 0.5) +
  geom_vline(xintercept = 50, color = "red", linetype = "dashed", size = 1) +
  theme_minimal() +
  labs(title = "Energy Bar Weight Verification", x = "Weight (g)")
Figure 1: Sample distribution vs. Target Weight

Figure 1: Sample distribution vs. Target Weight


4. Real-Life Example 2: Healthcare (Independent T-Test)

Scenario: Testing if a new drug lowers blood pressure more effectively than a placebo.

control <- rnorm(30, mean = 140, sd = 10)
treatment <- rnorm(30, mean = 132, sd = 10)

bp_data <- data.frame(
  Group = rep(c("Control", "Treatment"), each = 30),
  BP = c(control, treatment)
)

# Independent T-Test (Formula method works here)
t_test_ind <- t.test(BP ~ Group, data = bp_data)
print(t_test_ind)
## 
##  Welch Two Sample t-test
## 
## data:  BP by Group
## t = 1.0688, df = 55.548, p-value = 0.2898
## alternative hypothesis: true difference in means between group Control and group Treatment is not equal to 0
## 95 percent confidence interval:
##  -2.566797  8.436714
## sample estimates:
##   mean in group Control mean in group Treatment 
##                136.5544                133.6194
ggplot(bp_data, aes(x = Group, y = BP, fill = Group)) +
  geom_boxplot(alpha = 0.7) +
  stat_summary(fun = mean, geom = "point", color = "red", size = 3) +
  theme_light()
Figure 2: Blood Pressure comparison between groups

Figure 2: Blood Pressure comparison between groups


5. Real-Life Example 3: Education (Paired T-Test)

Scenario: Students take a math test Before and After a specific training bootcamp. We are testing the improvement within the same students.

Note: To avoid the formula method error, we pass the two time-points as separate vectors.

# Data Generation
before <- c(65, 70, 68, 72, 55, 60, 80, 75, 66, 70)
after  <- c(72, 75, 70, 78, 58, 65, 82, 80, 70, 74)

# CORRECTED: Using vector method for Paired T-test
t_test_paired <- t.test(after, before, paired = TRUE)
print(t_test_paired)
## 
##  Paired t-test
## 
## data:  after and before
## t = 8.3096, df = 9, p-value = 1.632e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  3.129396 5.470604
## sample estimates:
## mean difference 
##             4.3
# Data for plotting
paired_df <- data.frame(
  ID = rep(1:10, 2),
  Time = factor(rep(c("Before", "After"), each = 10), levels = c("Before", "After")),
  Score = c(before, after)
)
ggplot(paired_df, aes(x = Time, y = Score, group = ID)) +
  geom_line(color = "gray") +
  geom_point(aes(color = Time), size = 3) +
  theme_minimal() +
  labs(title = "Pre- vs Post-Bootcamp Scores")
Figure 3: Individual Student Progress (Paired Data)

Figure 3: Individual Student Progress (Paired Data)


6. Assumptions & Diagnostics

  1. Normality: Sample means should be normally distributed.
  2. Independence: Observations in independent tests must not be related.
  3. Homogeneity of Variance: Independent groups should have similar spreads.

Normality Check (Q-Q Plot)

qqnorm(energy_bars)
qqline(energy_bars, col = "steelblue")


7. Summary Table

Test Type Input Format in R Purpose
One-Sample t.test(x, mu = value) Compare sample to a standard.
Independent t.test(y ~ x) Compare two different groups.
Paired t.test(vec1, vec2, paired = TRUE) Compare same group at two times.

8. Conclusion

The T-test is a robust tool for hypothesis testing. By understanding whether your data is independent or paired, and selecting the correct R syntax, you can accurately determine the significance of your experimental results. ```

Why this version works:

  1. Fixed the Error: In the Paired T-Test section, I changed the code to t.test(after, before, paired = TRUE). This bypasses the formula limitation and is the standard way to run paired tests in R.
  2. Explicit Levels: In the paired data frame for plotting, I set levels = c("Before", "After") to ensure the plot shows “Before” first (chronological order) rather than alphabetical order.
  3. Clean Math: Used clear LaTeX notation for the differences formula in the Paired T-test section.