What is Hypothesis Testing?

Hypothesis testing is a statistical procedure for making decisions about a population using sample data.

The core idea:

  • Start with a claim about the population
  • Collect data
  • Decide whether the data provides enough evidence to reject that claim

The Two Hypotheses

Every hypothesis test has two competing statements:

Null Hypothesis \(H_0\): - The no effect claim - Always contains an equality: \(=\), \(\leq\), or \(\geq\)

Alternative Hypothesis \(H_a\): - The claim we are trying to find evidence for - Contains: \(\neq\), \(<\), or \(>\)

Example — testing whether a population mean \(\mu\) equals 50:

\[H_0: \mu = 50\] \[H_a: \mu \neq 50\]

We never prove \(H_0\) — we either reject it or fail to reject it.

The Test Statistic

We summarize the sample data into a single number called the test statistic, which measures how far the sample result is from what \(H_0\) claims.

For a one-sample \(z\)-test (known \(\sigma\)):

\[z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}\]

For a one-sample \(t\)-test (unknown \(\sigma\)):

\[t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \sim t_{n-1}\]

Where: \(\bar{X}\) = sample mean, \(\mu_0\) = hypothesized mean under \(H_0\), \(s\) = sample standard deviation, \(n\) = sample size

Types of Errors

No test is perfect there is two types of errors that can happen:

\(H_0\) is True \(H_0\) is False
Fail to Reject \(H_0\) ✅ Correct ❌ Type II Error (\(\beta\))
Reject \(H_0\) ❌ Type I Error (\(\alpha\)) ✅ Correct (Power)
  • Type I Error (\(\alpha\)): Rejecting \(H_0\) when it is actually true (“false positive”)
  • Type II Error (\(\beta\)): Failing to reject \(H_0\) when it is actually false (“false negative”)
  • Power = \(1 - \beta\): probability of correctly rejecting a false \(H_0\)

The P-Value

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed, if \(H_0\) is true:

\[p\text{-value} = P(\text{test stat} \geq |t_{obs}| \mid H_0 \text{ true})\]

Decision rule:

\[\text{If } p\text{-value} \leq \alpha \Rightarrow \text{Reject } H_0\] \[\text{If } p\text{-value} > \alpha \Rightarrow \text{Fail to Reject } H_0\]

A small p-value means the observed data is unlikely under \(H_0\), providing evidence against it.

A p-value is not the probability that \(H_0\) is true.

Visualizing the P-Value

Example: One-Sample t-Test in R

A manufacturer claims the mean weight of a product is 500g. We sample 30 items and test whether the true mean is different

set.seed(42)
weights <- rnorm(30, mean = 492, sd = 15)

# One-sample t-test
t.test(weights, mu = 500, alternative = "two.sided", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  weights
## t = -2.0283, df = 29, p-value = 0.05181
## alternative hypothesis: true mean is not equal to 500
## 95 percent confidence interval:
##  485.9993 500.0583
## sample estimates:
## mean of x 
##  493.0288

Visualizing the Sample vs. Claim

Power & Sample Size

Power = probability of rejecting \(H_0\) when it is truly false:

\[\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_a \text{ is true})\]

Power increases when:

  • The true effect size is larger
  • Sample size \(n\) is larger
  • Significance level \(\alpha\) is larger

3D View: Effect Size, Sample Size & Power

Summary

Concept Description
\(H_0\) Null hypothesis — no effect / status quo
\(H_a\) Alternative hypothesis — what we test for
Test statistic Measures distance of data from \(H_0\)
p-value Probability of data this extreme under \(H_0\)
Type I error (\(\alpha\)) False positive — rejecting true \(H_0\)
Type II error (\(\beta\)) False negative — missing true \(H_a\)
Power \(1 - \beta\), ability to detect real effects