Hypothesis testing is a statistical procedure for making decisions about a population using sample data.
The core idea:
- Start with a claim about the population
- Collect data
- Decide whether the data provides enough evidence to reject that claim
Hypothesis testing is a statistical procedure for making decisions about a population using sample data.
The core idea:
Every hypothesis test has two competing statements:
Null Hypothesis \(H_0\): - The no effect claim - Always contains an equality: \(=\), \(\leq\), or \(\geq\)
Alternative Hypothesis \(H_a\): - The claim we are trying to find evidence for - Contains: \(\neq\), \(<\), or \(>\)
Example — testing whether a population mean \(\mu\) equals 50:
\[H_0: \mu = 50\] \[H_a: \mu \neq 50\]
We never prove \(H_0\) — we either reject it or fail to reject it.
We summarize the sample data into a single number called the test statistic, which measures how far the sample result is from what \(H_0\) claims.
For a one-sample \(z\)-test (known \(\sigma\)):
\[z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}\]
For a one-sample \(t\)-test (unknown \(\sigma\)):
\[t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \sim t_{n-1}\]
Where: \(\bar{X}\) = sample mean, \(\mu_0\) = hypothesized mean under \(H_0\), \(s\) = sample standard deviation, \(n\) = sample size
No test is perfect there is two types of errors that can happen:
| \(H_0\) is True | \(H_0\) is False | |
|---|---|---|
| Fail to Reject \(H_0\) | ✅ Correct | ❌ Type II Error (\(\beta\)) |
| Reject \(H_0\) | ❌ Type I Error (\(\alpha\)) | ✅ Correct (Power) |
The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one computed, if \(H_0\) is true:
\[p\text{-value} = P(\text{test stat} \geq |t_{obs}| \mid H_0 \text{ true})\]
Decision rule:
\[\text{If } p\text{-value} \leq \alpha \Rightarrow \text{Reject } H_0\] \[\text{If } p\text{-value} > \alpha \Rightarrow \text{Fail to Reject } H_0\]
A small p-value means the observed data is unlikely under \(H_0\), providing evidence against it.
A p-value is not the probability that \(H_0\) is true.
A manufacturer claims the mean weight of a product is 500g. We sample 30 items and test whether the true mean is different
set.seed(42) weights <- rnorm(30, mean = 492, sd = 15) # One-sample t-test t.test(weights, mu = 500, alternative = "two.sided", conf.level = 0.95)
## ## One Sample t-test ## ## data: weights ## t = -2.0283, df = 29, p-value = 0.05181 ## alternative hypothesis: true mean is not equal to 500 ## 95 percent confidence interval: ## 485.9993 500.0583 ## sample estimates: ## mean of x ## 493.0288
Power = probability of rejecting \(H_0\) when it is truly false:
\[\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_a \text{ is true})\]
Power increases when:
| Concept | Description |
|---|---|
| \(H_0\) | Null hypothesis — no effect / status quo |
| \(H_a\) | Alternative hypothesis — what we test for |
| Test statistic | Measures distance of data from \(H_0\) |
| p-value | Probability of data this extreme under \(H_0\) |
| Type I error (\(\alpha\)) | False positive — rejecting true \(H_0\) |
| Type II error (\(\beta\)) | False negative — missing true \(H_a\) |
| Power | \(1 - \beta\), ability to detect real effects |