From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision
- What hypothesis testing is
- One sample t-test example
- Visualizations (ggplot, plotly)
- Decision + Interpretation
February 08, 2026
A teacher claims a study app they made increases average exam scores above 80.
Hypothesis testing is using sample data - (test scores)
from a population - (students using the app)
to draw conclusions about how likely the possible outcomes of an experiment are.
Hypothesis testing helps us know if the results we get from an experiment are significant, or if they just occurred by chance.
\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)
\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)
Start with the default assumption \(\rightarrow\) the null hypothesis-
There are 2 errors you can make:
Type 1: \(\alpha = P(\text{reject } H_0 \mid H_0 \text{ is true})\)
Type 2: \(\beta = P(\text{fail to reject }H_0 \mid H_1 \text{ is true})\)
We choose a significance level: \(\alpha\)
\(\alpha\) tells the test: If \(H_0\) is true, we’re willing to accept \(\alpha\%\) chance of rejecting \(H_0\) by mistake.
\(P(\text{reject } H_0 \mid H_0 \text{ is true}) = \alpha\)
We have \(\alpha \%\) chance of getting a Type 1 error.
Sample size: \(n = 30\)
| StudentID | Score | StudentID | Score | StudentID | Score |
|---|---|---|---|---|---|
| S1 | 78 | S11 | 75 | S21 | 71 |
| S2 | 82 | S12 | 78 | S22 | 84 |
| S3 | 75 | S13 | 99 | S23 | 89 |
| S4 | 80 | S14 | 82 | S24 | 91 |
| S5 | 76 | S15 | 72 | S25 | 72 |
| S6 | 91 | S16 | 89 | S26 | 88 |
| S7 | 79 | S17 | 74 | S27 | 88 |
| S8 | 73 | S18 | 83 | S28 | 72 |
| S9 | 87 | S19 | 77 | S29 | 80 |
| S10 | 83 | S20 | 90 | S30 | 73 |
Where the sample scores fall relative to the default mean \(\mu_0 = 80\)
Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)
`x_bar <- round(mean(exam_scores), digits = 3)`
`sd <- round(sd(exam_scores), digits = 3)`
\(n=\) sample size = 30 and \(\mu_0\) = 80
Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)
\(t = \frac{82.3 - 80}{\frac {7.349}{\sqrt{30}}} = \frac{2.3}{40.252}\)
\(t = 1.714\)
The p-value: The probability of seeing a test statistic \(T\) at least as large as the one we got assuming \(H_0\) is true.
One-sided p-value: \(p = P(T \geq t_{obs}) \text{ for } H_1: \mu > \mu_0\)
t_obs <-
(mean(exam_scores) - 80) / (sd(exam_scores) /
sqrt(length(exam_scores)))
p_value <- pt(t_obs, df = length(exam_scores) - 1, lower.tail=FALSE)
p_value[1] 0.04858552
pValue = 0.0485855
`t_crit <- qt(1 - alpha, df = n - 1)`
Plot our t value and see if it falls to the right or the left of our critical value
Everything to the right of alpha is in the “reject null hypothesis” zone
Our p-value calculated how much area under the curve is to the right of our test statistic
So, if the p-value < \(\alpha\) we know that our test statistic falls in the rejection zone
p-value=0.049 < α=0.05
Therefore, we reject the null hypothesis.
Power is the probability a hypothesis test correctly identifies a real effect.
Power increases as:
sample size,\(n\), increases (more information)
effect size increases (the bigger the difference the easier to detect)
effect size is the change between your null hypothesis value and test value
A generic power function, and one with our example data:
alpha <- 0.05
samp_size_grid <- seq(5, 80, by = 2)
delta_grid <- seq(0, 12, by = 0.25)
power_generic
<- outer(
delta_grid, samp_size_grid,
function(d, n) power.t.test(
n = n, delta = d, sd = 1,
sig.level = alpha,
type = "one.sample",
alternative = "one.sided"
)$power
)
p_generic <- plot_ly(
x = samp_size_grid, y = delta_grid, z = power_generic,
type = "surface"
)
delta_hat <- xbar - mu0
n_obs <- length(exam_scores)
power_at_obs
<- power.t.test(
n = n_obs, delta = delta_hat, sd = sd_data,
sig.level = alpha,
type = "one.sample",
alternative = "one.sided"
)$power
power_plot <- plot_ly(
x=samp_size_grid, y=effect_size_grid, z=power_matrix,
type="surface"
) %>%
add_markers(
x = n_obs,
y = delta_hat,
z = power_at_obs,
marker = list(size = 6, color = "red"),
name = "Our study (n, δ̂)"
) %>%
Hypothesis tests create an S shaped curve