2026-06-06

Understanding Hypothesis Testing

What is Hypothesis Testing?

  • A statistical method used to decide whether there is enough evidence in a sample to support or reject a claim about a population parameter.
  • It compares two completing statements: Null Hypothesis and Alternative Hypothesis.
  • Key concepts include Significance Level, P-Value, Test Statistic, Type I Error, and Type II Error.

Motivation

Suppose a bottling company claims that each bottle contains 500 mL on average.

Does the evidence from a sample support this claim?

Hypothesis testing helps us answer questions like this using data.

Null and Alternative Hypotheses

The null hypothesis:

\[H_0:\mu = 500\]

The alternative hypothesis:

\[H_a:\mu \neq 500\]

Where:

  • \(H_0\) is the null hypothesis.
  • \(H_a\) is the alternative hypothesis.
  • \(\mu\) is the population mean.

Test Statistic

For a z-test, the test statistic is

\[z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\]

where:

  • \(\bar{x}\) = sample mean
  • \(\mu_0\) = hypothesized mean
  • \(\sigma\) = population standard deviation
  • \(n\) = sample size

Example

Suppose:

  • Sample size: n = 36
  • Sample mean: 495
  • Population standard deviation: 12

Then

\[z=\frac{495-500}{12/\sqrt{36}} =-2.5\]

Since the test statistic is far from 0, there is evidence against the null hypothesis.

Sampling Distribution (ggplot)

This curve represents the distribution expected if the company’s claim is true.

Simulated Sample Means (ggplot)

The histogram shows the distribution of sample means from 1000 samples.

Interactive Plotly Visualization

This graph shows the standard normal distribution used in many hypothesis tests.

R Code Example

The following code generates the histogram shown previously.

set.seed(123)

means = replicate(
  1000,
  mean(rnorm(36, mean = 500, sd = 12))
)

df_means = data.frame(means)

ggplot(df_means, aes(means)) +
  geom_histogram(bins = 25)

Conclusion

Key ideas of hypothesis testing:

  • State the null and alternative hypotheses.
  • Calculate a test statistic.
  • Determine a p-value.
  • Make a decision based on the evidence.
  • Use data to draw conclusions about a population.

Hypothesis testing is one of the most widely used tools in statistics.