Hypothesis Testing

2025-11-09

What is Hypothesis Testing?

Hypothesis testing is a way data is used to make decisions. We test a statement (hypothesis) about a population using a sample.

Example: We want to know if a new study method improves student’s exams scores compared to a traditional method.

We test this using Hypothesis Testing

Collect Data
Compare Averages
Decide if difference is real or random

Two types of Hypotheses

Null hypothesis (H₀): The new method has no effect. \(H_0: \mu_{new} = \mu_{old}\)
Alternative hypothesis (Hₐ): The new method improves scores. \(H_a: \mu_{new} > \mu_{old}\)

Math

For two groups of students: \[ t = \frac{\bar{x}_{new} - \bar{x}_{old}}{\sqrt{\frac{s_{new}^2}{n_{new}} + \frac{s_{old}^2}{n_{old}}}} \]

If the difference between the averages is big enough (|t| is large), it is unlikely to have happened by luck. H₀ is rejected

Example

Study Methods vs Test Scores 30 students in each group (old/new study method) are tested. Code to create data set. Example is not accurate, random data used.

set.seed(42)
traditional <- rnorm(30, mean = 75, sd = 8)
new_method <- rnorm(30, mean = 81, sd = 7)

data <- tibble(
  method = c(rep("Traditional", 30), rep("New Method", 30)),
  score = c(traditional, new_method)
)

Running the t-test

Test to see if the new study method leads to higher exam scores than the typical study method. This is a right-tailed t-test Right-Tailed t-test - testing if the new group’s mean is greater Code to run a t-test.

tt<-t.test(score ~ method, data=data, alternative = "greater")
t_value <- unname(tt$statistic)
p_value <- tt$p.value

cat("t-value:", round(tt$statistic, 3), 
    "p-value:", round(tt$p.value, 4))

## t-value: 2.024 p-value: 0.024

Results: - t-value how far apart the two groups are from each other - p-value probability of seeing this difference randomly If P < 0.05, reject the H₀ and conclude the new method improves exam scores.

Boxplot of Exam Scores

If the mean of the new method box is higher, this suggests the students scored better using that method.

Sampling distribution

We can bootstrap the data to see how the averages vary randomly. Code to complete bootstrap using R:

set.seed(10)
boot_diffs <- replicate(3000, {
mean(sample(traditional, 30, replace = TRUE)) -
mean(sample(new_method, 30, replace = TRUE))
})
boot_df <- tibble(diff = boot_diffs)
obs_diff <- mean(traditional) - mean(new_method)

ggplot(boot_df, aes(x = diff)) +
geom_histogram(bins = 40, fill = "lightblue") +
geom_vline(xintercept = obs_diff, color = "green", linetype = "dashed") +
labs(title = "Bootstrap Differences in Mean Exam Scores",
x = "Mean Difference (Traditional – New Method)",
y = "Frequency") +
theme_minimal(base_size = 14)

Plotly Code

cat("<style>pre code{font-size:14px;}</style>")
tt <- t.test(score ~ method, data = data, alternative = "greater")
t_val <- as.numeric(tt$statistic)
df <- as.numeric(tt$parameter)
x <- seq(-5, 5, 0.01)
y <- dt(x, df)
pval <- pt(t_val, df, lower.tail = FALSE)

x_right <- x[x >= t_val]

plot_ly() |>
add_lines(x = x, y = y, name = "t-curve") |>
add_polygons(x = c(x_right, rev(x_right)),
y = c(dt(x_right, df), rep(0, length(x_right))),
name = "p-value area", opacity = 0.5) |>
layout(title = paste("p-value =", signif(pval, 3)),
xaxis = list(title = "t values"),
yaxis = list(title = "Density"))

Plotly graph

The plot shows the created t-distrubution curve with a shaded area representing the p-value. Small shaded area = small p-value = reject H₀

p-value

The p-value is the chance of seeing results this extreme if H₀ were true. \[ p = P(T_{\text{df}} \ge t_{\text{obs}}) \]

If 𝑝<𝛼reject H₀, (evidence that the new method helps). If 𝑝<𝛼, fail to reject H₀ (not enough evidence).

Conclusion

Hypothesis testing helps to decide is results are real or by accident
In our example, we tested a new study method vs exam scores
a low p-value (< 0.05) means they results are not likely by random chance, so the null hypothesis is rejected.