What is Hypothesis Testing?

  • A method of statistical inference to test assumptions (hypotheses) about population parameters.
  • Null Hypothesis (H₀): The default assumption.
  • Alternative Hypothesis (H₁): What we aim to support.

Real-World Example

Suppose a factory claims its light bulbs last 1000 hours on average. We suspect they don’t.

  • H₀: μ = 1000
  • H₁: μ ≠ 1000

The Process

  1. State the hypotheses
  2. Choose significance level (α)
  3. Compute test statistic
  4. Determine p-value
  5. Draw conclusion

Math: Test Statistic

\[ z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

Where:

  • \(\bar{x}\): sample mean
  • \(\mu\): hypothesized mean
  • \(\sigma\): population std dev
  • \(n\): sample size

Math: p-value Interpretation

\[ p = P(Z \geq |z|) \]

  • Small p-value (≤ 0.05): reject H₀
  • Large p-value (> 0.05): fail to reject H₀

Code Setup (R)

library(ggplot2)
set.seed(123)
data <- rnorm(100, mean=995, sd=10)
sample_mean <- mean(data)
sample_sd <- sd(data)
n <- length(data)
z <- (sample_mean - 1000) / (sample_sd / sqrt(n))
z
## [1] -4.487149

Histogram of Sample Data

ggplot(data.frame(x=data), aes(x)) +
  geom_histogram(bins=30, fill="lightblue", color="black") +
  geom_vline(xintercept=1000, color="red", linetype="dashed") +
  labs(title="Sample Light Bulb Lifespan", x="Hours", y="Count")

P-value Area Explained

  • Shows how far the sample mean is from the null mean
  • Area in red represents the probability of observing such an extreme value under H₀

P-value Area Visualization

3D Plot (Surface of Normal Distribution)

Conclusion

  • Hypothesis testing helps assess claims using data
  • p-values and significance levels guide decisions
  • Visualization strengthens understanding