Module 4: HW3

2025-10-26

Introduction to Hypothesis Testing

Hypothesis testing is a method of making decisions using data.
We test an assumption (the “null hypothesis”) against an alternative.
Common in science, medicine, and engineering.

What is a Hypothesis?

Null Hypothesis (\(H_0\)): Status quo or default claim.
Alternative Hypothesis (\(H_1\) or \(H_a\)): What we hope to support.
Decision is based on sample data — never 100% certainty.

Example Scenario

A battery manufacturer claims their batteries last at least 100 hours on average.
From production data, the manufacturer calculated the population standard deviation to be 5
Another competitor tests 50 batteries and gets a mean of 98.5 hours.
Is this enough evidence to reject the manufacturer’s claim?

Key Formula (Math Text Slide 1)

We use a one-sample z-test for the mean to compare a sample mean (obtained from the competitor) to a known population mean (as claimed by the manufacturer):

\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

Where:
- \(\bar{x}\) = sample mean
- \(\mu_0\) = claimed population mean
- \(\sigma\) = population standard deviation
- \(n\) = sample size

Visualizing the Null Distribution (Plotly)

Notation for the Null Distribution plot

This plot displays the standard normal distribution (mean = 0, SD = 1), which represents the null hypothesis \(H_0\).
The black curve shows the shape of the z-distribution under \(H_0\).
The red line at \(z = -2.12\) marks our observed test statistic, calculated from the sample data.
This z-score tells us how many standard errors the observed sample mean (98.5) is below the hypothesized population mean (100).

Simulation of Sample Means (next Slide)

Here we simulated 1000 random samples of size 50.
Each sample is drawn from a population with true mean 100 (manufacturer claim) and standard deviation of 5.
The plot shows the distribution of the sample means.
The observed sample mean (98.5) is far into the tail of the sampling distribution under H₀ (centered at 100), making it unlikely to occur by chance — suggesting statistical significance.

Plot of Sample Means Distribution (ggplot2 Plot 1)

p-value Interpretation (Math Text Slide 2)

The p-value is the probability of getting a test statistic at least as extreme as ours, under \(H_0\).
If \(p < α\) (e.g., 0.05), we reject \(H_0\).
Lower p-values indicate stronger evidence against \(H_0\).

R Code to Run z-test (R Code Slide)

z_test <- function(xbar, mu0, sigma, n) {
  z <- (xbar - mu0) / (sigma / sqrt(n))
  p <- 2 * (1 - pnorm(abs(z)))
  list(z = z, p = p)
}

z_test(98.5, 100, 5, 50)

## $z
## [1] -2.12132
## 
## $p
## [1] 0.03389485

ggplot2: Visualizing p-values (ggplot2 Plot 2)

Visual Verdict: Reject or Not?

Conclusion

The z-test statistic falls in the shaded rejection region beyond ±1.96 (from α = 0.05 according to standard practice), indicating a significant result.
Since the visualized region p-value (red) extends outside the blue line (p < α with 0.034 < 0.05) we reject the null hypothesis (the base claim by the manufacturer)