February 08, 2026

Hypothesis Testing with a One-Sample t-Test

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

    • What hypothesis testing is
    • One sample t-test example
    • Visualizations (ggplot, plotly)
    • Decision + Interpretation

What is Hypothesis Testing?

Example:

A teacher claims a study app they made increases average exam scores above 80.

Is it true?

What is Hypothesis Testing?

Hypothesis testing is using sample data - (test scores)

from a population - (students using the app)

to draw conclusions about how likely the possible outcomes of an experiment are.

Hypothesis testing helps us know if the results we get from an experiment are significant, or if they just occurred by chance.

Does the app really increase scores or is the teacher trying to sell their app?

Hypotheses and Errors

From hypotheses

Null and Alternative Hypotheses:

\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)

  • Start with the default assumption \(\rightarrow\) the null hypothesis-
  • \(H_0\): \(\mu\) = \(\mu_0\)

  • \(H_0\): average scores of students using the app had no signinficant change

Hypotheses and Errors

From hypotheses

Null and Alternative Hypotheses:

\(H_0\): \(\mu\) = \(\mu_0\) vs. \(H_1\): \(\mu\) > \(\mu_0\)

Start with the default assumption \(\rightarrow\) the null hypothesis-

  • and look for evidence against it \(\rightarrow\) the alternative hypothesis-
  • \(H_1\): \(\mu\) > \(\mu_0\)

  • \(H_1\): average exam scores increased due to app usage

Hypotheses and Errors

Type 1 and 2 Errors

There are 2 errors you can make:

  • Type 1: \(\alpha = P(\text{reject } H_0 \mid H_0 \text{ is true})\)

  • Type 2: \(\beta = P(\text{fail to reject }H_0 \mid H_1 \text{ is true})\)

Hypotheses and Errors and Significance Level

Type 1 and 2 Errors

We choose a significance level: \(\alpha\)

  • \(\alpha\) tells the test: If \(H_0\) is true, we’re willing to accept \(\alpha\%\) chance of rejecting \(H_0\) by mistake.

  • \(P(\text{reject } H_0 \mid H_0 \text{ is true}) = \alpha\)

  • We have \(\alpha \%\) chance of getting a Type 1 error.

Our Example

  • Parameter of interest: population mean score \(\mu\)
  • Default comparison: \(\mu_0\) = \(80\)
  • Sample: 30 randomly chosen students using the app \(n=30\)
  • Outcome is a decision at a significance level \(\alpha\) (e.g. 0.05)
  • Run a one-sided t-test

Example data

Sample size: \(n = 30\)

StudentID Score StudentID Score StudentID Score
S1 78 S11 75 S21 71
S2 82 S12 78 S22 84
S3 75 S13 99 S23 89
S4 80 S14 82 S24 91
S5 76 S15 72 S25 72
S6 91 S16 89 S26 88
S7 79 S17 74 S27 88
S8 73 S18 83 S28 72
S9 87 S19 77 S29 80
S10 83 S20 90 S30 73

Distribution of Exam Scores

Where the sample scores fall relative to the default mean \(\mu_0 = 80\)

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic

Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)

\(\bar{x} =\) sample mean =
`x_bar <- round(mean(exam_scores), digits = 3)`
\(s =\) sample standard deviation =
`sd <- round(sd(exam_scores), digits = 3)`

\(n=\) sample size = 30 and \(\mu_0\) = 80

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic

Test Statistic \(t = \frac{\bar{x} - \mu_0}{\frac {s}{\sqrt{n}}}\)

  • \(\bar{x} =\) sample mean = 82.3
  • \(s =\) sample standard deviation = 7.349
  • \(n=\) sample size = 30

\(t = \frac{82.3 - 80}{\frac {7.349}{\sqrt{30}}} = \frac{2.3}{40.252}\)

\(t = 1.714\)

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

One-sided p-value:

\(p = P(T \geq t_{obs}) \text{ for } H_1: \mu > \mu_0\)

The p-value: The probability of seeing a test statistic \(T\) at least as large as the one we got assuming \(H_0\) is true.

  • When we run this test, if the app didn’t actually increase grades, how likely is it that we will get a result greater than 80?
  • \(\alpha\) The significance level tells us the percent chance we are willing to make a Type 1 error.
  • \(p\) The p-value tells us how likely it is to make a Type 1 error.

Test Statistic & p-Value

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

One-sided p-value: \(p = P(T \geq t_{obs}) \text{ for } H_1: \mu > \mu_0\)

t_obs <- 
  (mean(exam_scores) - 80) / (sd(exam_scores) / 
    sqrt(length(exam_scores)))
p_value <- pt(t_obs, df = length(exam_scores) - 1, lower.tail=FALSE)
p_value
[1] 0.04858552

pValue = 0.0485855

Test Statistic & p-Value Visualization

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value

t-Distribution and Rejection Region

  • Significance level \(\alpha = 0.05\): We want the value where 95% of the distribution is to the left.
  • Dashed line is the critical value: the point on the graph where the area under the curve to the right of it equals 0.05, and the area to the curve left of it equals .95
`t_crit <- qt(1 - alpha, df = n - 1)` 

Decision

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

Plot our t value and see if it falls to the right or the left of our critical value

Visualization and Interpretation

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision

  • Our t statistic falls slightly to the right of the critical value
  • So we reject the null hypothesis
  • We have reason to believe (with 95% certainty) that our sample shows using the app did raise students’ grades above 80.

The Easy Way

\(\alpha\) vs p-Value

  • Everything to the right of alpha is in the “reject null hypothesis” zone

  • Our p-value calculated how much area under the curve is to the right of our test statistic

  • So, if the p-value < \(\alpha\) we know that our test statistic falls in the rejection zone

p-value=0.049 < α=0.05

Therefore, we reject the null hypothesis.

Power as a Function of n and Effect Size

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

Why small samples struggle to detect small effects:

 

Power is the probability a hypothesis test correctly identifies a real effect.

Power increases as:

  • sample size,\(n\), increases (more information)

  • effect size increases (the bigger the difference the easier to detect)

  •   effect size is the change between your null hypothesis value and test value 
  • \(\delta = \bar{x} - \mu_0\)

Power: Function of n and \(\delta\)

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

A generic power function, and one with our example data:

alpha <- 0.05
samp_size_grid <- seq(5, 80, by = 2)
delta_grid <- seq(0, 12, by = 0.25)   

power_generic 
<- outer(
  delta_grid, samp_size_grid,
  function(d, n) power.t.test(
    n = n, delta = d, sd = 1,
    sig.level = alpha, 
  type = "one.sample",
    alternative = "one.sided"
  )$power
)

p_generic <- plot_ly(
  x = samp_size_grid, y = delta_grid, z = power_generic,
  type = "surface"
)
delta_hat <- xbar - mu0
n_obs <- length(exam_scores)

power_at_obs 
<- power.t.test(
  n = n_obs, delta = delta_hat, sd = sd_data,
  sig.level = alpha, 
  type = "one.sample",
  alternative = "one.sided"
)$power

power_plot <- plot_ly(
  x=samp_size_grid, y=effect_size_grid, z=power_matrix, 
  type="surface"
) %>%

add_markers(
    x = n_obs,
    y = delta_hat,
    z = power_at_obs,
    marker = list(size = 6, color = "red"),
    name = "Our study (n, δ̂)"
  ) %>%

Power Function Visualization

From hypotheses \(\rightarrow\) test statistic \(\rightarrow\) p-value \(\rightarrow\) decision \(\rightarrow\) comprehension

Hypothesis tests create an S shaped curve

  • there is a sharp increase in power when sample size or effect size grows
  • then a leveling off as power reaches 1