2025-11-09

1

What is Hypothesis Testing?

Hypothesis testing is a way data is used to make decisions. We test a statement (hypothesis) about a population using a sample.

Example: We want to know if a new study method improves studentโ€™s exams scores compared to a traditional method.

We test this using Hypothesis Testing

  • Collect Data
  • Compare Averages
  • Decide if difference is real or random

Two types of Hypotheses

  • Null hypothesis (Hโ‚€): The new method has no effect. \(H_0: \mu_{new} = \mu_{old}\)

  • Alternative hypothesis (Hโ‚): The new method improves scores. \(H_a: \mu_{new} > \mu_{old}\)

Math

For two groups of students: \[ t = \frac{\bar{x}_{new} - \bar{x}_{old}}{\sqrt{\frac{s_{new}^2}{n_{new}} + \frac{s_{old}^2}{n_{old}}}} \]

If the difference between the averages is big enough (|t| is large), it is unlikely to have happened by luck. Hโ‚€ is rejected

Example

Study Methods vs Test Scores 30 students in each group (old/new study method) are tested. Code to create data set. Example is not accurate, random data used.

set.seed(42)
traditional <- rnorm(30, mean = 75, sd = 8)
new_method <- rnorm(30, mean = 81, sd = 7)

data <- tibble(
  method = c(rep("Traditional", 30), rep("New Method", 30)),
  score = c(traditional, new_method)
)

Running the t-test

Test to see if the new study method leads to higher exam scores than the typical study method. This is a right-tailed t-test Right-Tailed t-test - testing if the new groupโ€™s mean is greater Code to run a t-test.

tt<-t.test(score ~ method, data=data, alternative = "greater")
t_value <- unname(tt$statistic)
p_value <- tt$p.value

cat("t-value:", round(tt$statistic, 3), 
    "p-value:", round(tt$p.value, 4))
## t-value: 2.024 p-value: 0.024

Results: - t-value how far apart the two groups are from each other - p-value probability of seeing this difference randomly If P < 0.05, reject the Hโ‚€ and conclude the new method improves exam scores.

Boxplot of Exam Scores

If the mean of the new method box is higher, this suggests the students scored better using that method.

Sampling distribution

We can bootstrap the data to see how the averages vary randomly. Code to complete bootstrap using R:

set.seed(10)
boot_diffs <- replicate(3000, {
mean(sample(traditional, 30, replace = TRUE)) -
mean(sample(new_method, 30, replace = TRUE))
})
boot_df <- tibble(diff = boot_diffs)
obs_diff <- mean(traditional) - mean(new_method)

ggplot(boot_df, aes(x = diff)) +
geom_histogram(bins = 40, fill = "lightblue") +
geom_vline(xintercept = obs_diff, color = "green", linetype = "dashed") +
labs(title = "Bootstrap Differences in Mean Exam Scores",
x = "Mean Difference (Traditional โ€“ New Method)",
y = "Frequency") +
theme_minimal(base_size = 14)

Plotly Code

cat("<style>pre code{font-size:14px;}</style>")
tt <- t.test(score ~ method, data = data, alternative = "greater")
t_val <- as.numeric(tt$statistic)
df <- as.numeric(tt$parameter)
x <- seq(-5, 5, 0.01)
y <- dt(x, df)
pval <- pt(t_val, df, lower.tail = FALSE)

x_right <- x[x >= t_val]

plot_ly() |>
add_lines(x = x, y = y, name = "t-curve") |>
add_polygons(x = c(x_right, rev(x_right)),
y = c(dt(x_right, df), rep(0, length(x_right))),
name = "p-value area", opacity = 0.5) |>
layout(title = paste("p-value =", signif(pval, 3)),
xaxis = list(title = "t values"),
yaxis = list(title = "Density"))

Plotly graph

The plot shows the created t-distrubution curve with a shaded area representing the p-value. Small shaded area = small p-value = reject Hโ‚€

p-value

The p-value is the chance of seeing results this extreme if Hโ‚€ were true. \[ p = P(T_{\text{df}} \ge t_{\text{obs}}) \]

If ๐‘<๐›ผreject Hโ‚€, (evidence that the new method helps). If ๐‘<๐›ผ, fail to reject Hโ‚€ (not enough evidence).

Conclusion

  • Hypothesis testing helps to decide is results are real or by accident
  • In our example, we tested a new study method vs exam scores
  • a low p-value (< 0.05) means they results are not likely by random chance, so the null hypothesis is rejected.