Hypothesis Testing

February 08, 2026

Hypothesis Testing with a One-Sample t-Test

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

What hypothesis testing is
One sample t-test example
Visualizations (ggplot, plotly)
Decision + Interpretation

What is Hypothesis Testing?

Example:

A teacher claims a study app they made increases average exam scores above 80.

Is it true?

What is Hypothesis Testing?

Hypothesis testing is using sample data - (test scores)

from a population - (students using the app)

to draw conclusions about how likely the possible outcomes of an experiment are.

Hypothesis testing helps us know if the results we get from an experiment are significant, or if they just occurred by chance.

Does the app really increase scores or is the teacher trying to sell their app?

Hypotheses and Errors

From hypotheses

Null and Alternative Hypotheses:

\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)

Start with the default assumption \(\rightarrow\) the null hypothesis-

\(H_0\): \(\mu\) = \(\mu_0\)

\(H_0\): average scores of students using the app had no signinficant change

Hypotheses and Errors

From hypotheses

Null and Alternative Hypotheses:

\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)

Start with the default assumption \(\rightarrow\) the null hypothesis-

and look for evidence against it \(\rightarrow\) the alternative hypothesis-

\(H_1\): \(\mu\) > \(\mu_0\)

\(H_1\): average exam scores increased due to app usage

Hypotheses and Errors

Type 1 and 2 Errors

There are 2 errors you can make:

Type 1: \(\alpha = P(\text{reject } H_0 \mid H_0 \text{ is true})\)
Type 2: \(\beta = P(\text{fail to reject }H_0 \mid H_1 \text{ is true})\)

Hypotheses and Errors and Significance Level

Type 1 and 2 Errors

We choose a significance level: \(\alpha\)

\(\alpha\) tells the test: If \(H_0\) is true, we’re willing to accept \(\alpha\%\) chance of rejecting \(H_0\) by mistake.
\(P(\text{reject } H_0 \mid H_0 \text{ is true}) = \alpha\)
We have \(\alpha \%\) chance of getting a Type 1 error.

Our Example

Parameter of interest: population mean score \(\mu\)
Default comparison: \(\mu_0\) = \(80\)
Sample: 30 randomly chosen students using the app \(n=30\)
Outcome is a decision at a significance level \(\alpha\) (e.g. 0.05)
Run a one-sided t-test

Example data

Sample size: \(n = 30\)

StudentID	Score	StudentID	Score	StudentID	Score
S1	78	S11	75	S21	71
S2	82	S12	78	S22	84
S3	75	S13	99	S23	89
S4	80	S14	82	S24	91
S5	76	S15	72	S25	72
S6	91	S16	89	S26	88
S7	79	S17	74	S27	88
S8	73	S18	83	S28	72
S9	87	S19	77	S29	80
S10	83	S20	90	S30	73

Distribution of Exam Scores

Where the sample scores fall relative to the default mean \(\mu_0 = 80\)

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic

Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)

\(\bar{x} =\) sample mean =

`x_bar <- round(mean(exam_scores), digits = 3)`

\(s =\) sample standard deviation =

`sd <- round(sd(exam_scores), digits = 3)`

\(n=\) sample size = 30 and \(\mu_0\) = 80

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic

Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)

\(\bar{x} =\) sample mean = 82.3
\(s =\) sample standard deviation = 7.349
\(n=\) sample size = 30

\(t = \frac{82.3 - 80}{\frac {7.349}{\sqrt{30}}} = \frac{2.3}{40.252}\)

\(t = 1.714\)

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

One-sided p-value:

\(p = P(T \geq t_{obs}) \text{ for } H_1: \mu > \mu_0\)

The p-value: The probability of seeing a test statistic \(T\) at least as large as the one we got assuming \(H_0\) is true.

When we run this test, if the app didn’t actually increase grades, how likely is it that we will get a result greater than 80?
\(\alpha\) The significance level tells us the percent chance we are willing to make a Type 1 error.
\(p\) The p-value tells us how likely it is to make a Type 1 error.

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

One-sided p-value: \(p = P(T \geq t_{obs}) \text{ for } H_1: \mu > \mu_0\)

t_obs <- 
  (mean(exam_scores) - 80) / (sd(exam_scores) / 
    sqrt(length(exam_scores)))
p_value <- pt(t_obs, df = length(exam_scores) - 1, lower.tail=FALSE)
p_value

[1] 0.04858552

pValue = 0.0485855

Test Statistic & p-Value Visualization

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

t-Distribution and Rejection Region

Significance level \(\alpha = 0.05\): We want the value where 95% of the distribution is to the left.
Dashed line is the critical value: the point on the graph where the area under the curve to the right of it equals 0.05, and the area to the curve left of it equals .95

`t_crit <- qt(1 - alpha, df = n - 1)`

Decision

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

Plot our t value and see if it falls to the right or the left of our critical value

Visualization and Interpretation

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

Our t statistic falls slightly to the right of the critical value
So we reject the null hypothesis
We have reason to believe (with 95% certainty) that our sample shows using the app did raise students’ grades above 80.

The Easy Way

\(\alpha\) vs p-Value

Everything to the right of alpha is in the “reject null hypothesis” zone
Our p-value calculated how much area under the curve is to the right of our test statistic
So, if the p-value < \(\alpha\) we know that our test statistic falls in the rejection zone

p-value=0.049 < α=0.05

Therefore, we reject the null hypothesis.

Power as a Function of n and Effect Size

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

Why small samples struggle to detect small effects:

Power is the probability a hypothesis test correctly identifies a real effect.

Power increases as:

sample size,\(n\), increases (more information)
effect size increases (the bigger the difference the easier to detect)

  effect size is the change between your null hypothesis value and test value

\(\delta = \bar{x} - \mu_0\)

Power: Function of n and \(\delta\)

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

A generic power function, and one with our example data:

alpha <- 0.05
samp_size_grid <- seq(5, 80, by = 2)
delta_grid <- seq(0, 12, by = 0.25)   

power_generic 
<- outer(
  delta_grid, samp_size_grid,
  function(d, n) power.t.test(
    n = n, delta = d, sd = 1,
    sig.level = alpha, 
  type = "one.sample",
    alternative = "one.sided"
  )$power
)

p_generic <- plot_ly(
  x = samp_size_grid, y = delta_grid, z = power_generic,
  type = "surface"
)

delta_hat <- xbar - mu0
n_obs <- length(exam_scores)

power_at_obs 
<- power.t.test(
  n = n_obs, delta = delta_hat, sd = sd_data,
  sig.level = alpha, 
  type = "one.sample",
  alternative = "one.sided"
)$power

power_plot <- plot_ly(
  x=samp_size_grid, y=effect_size_grid, z=power_matrix, 
  type="surface"
) %>%

add_markers(
    x = n_obs,
    y = delta_hat,
    z = power_at_obs,
    marker = list(size = 6, color = "red"),
    name = "Our study (n, δ̂)"
  ) %>%

Power Function Visualization

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

Hypothesis tests create an S shaped curve

there is a sharp increase in power when sample size or effect size grows
then a leveling off as power reaches 1

StudentID	Score	StudentID	Score	StudentID	Score
S1	78	S11	75	S21	71
S2	82	S12	78	S22	84
S3	75	S13	99	S23	89
S4	80	S14	82	S24	91
S5	76	S15	72	S25	72
S6	91	S16	89	S26	88
S7	79	S17	74	S27	88
S8	73	S18	83	S28	72
S9	87	S19	77	S29	80
S10	83	S20	90	S30	73

StudentID	Score	StudentID	Score	StudentID	Score
S1	78	S11	75	S21	71
S2	82	S12	78	S22	84
S3	75	S13	99	S23	89
S4	80	S14	82	S24	91
S5	76	S15	72	S25	72
S6	91	S16	89	S26	88
S7	79	S17	74	S27	88
S8	73	S18	83	S28	72
S9	87	S19	77	S29	80
S10	83	S20	90	S30	73

StudentID	Score	StudentID	Score	StudentID	Score
S1	78	S11	75	S21	71
S2	82	S12	78	S22	84
S3	75	S13	99	S23	89
S4	80	S14	82	S24	91
S5	76	S15	72	S25	72
S6	91	S16	89	S26	88
S7	79	S17	74	S27	88
S8	73	S18	83	S28	72
S9	87	S19	77	S29	80
S10	83	S20	90	S30	73