Understanding P-Values

What is a P-Value?

A p-value is the probability of observing a result as extreme or more extreme than what was measured, given that the null hypothesis is true.

It quantifies how compatible the data is with \(H_0\)
A small p-value → data is unlikely under \(H_0\)
A large p-value → data is consistent with \(H_0\)
The most widely used significance level is \(\alpha = 0.05\)

The Hypothesis Testing Framework

Every test begins with two competing claims:

\[H_0: \mu = \mu_0 \quad \text{(null hypothesis)}\]

\[H_1: \mu \neq \mu_0 \quad \text{(alternative hypothesis)}\]

For a one-sample z-test, the test statistic is computed as:

\[Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}\]

The corresponding two-tailed p-value is:

\[p = 2 \times P\left(Z \geq |z_{obs}| \mid H_0 \text{ is true}\right)\]

Visualizing the P-Value

P-Value vs. Significance Level \(\alpha\)

P-value Range	Interpretation
\(p < 0.01\)	Very strong evidence against \(H_0\)
\(0.01 \leq p < 0.05\)	Strong evidence against \(H_0\)
\(0.05 \leq p < 0.10\)	Marginal evidence against \(H_0\)
\(p \geq 0.10\)	Insufficient evidence against \(H_0\)

The decision rule is:

\[\text{Reject } H_0 \text{ if and only if } p < \alpha\]

A Practical Example

Scenario: A coffee shop claims its drinks contain \(\mu_0 = 200\)mg of caffeine. A sample of \(n = 49\) drinks yields \(\bar{X} = 194\)mg, with known \(\sigma = 21\)mg.

\[Z = \frac{194 - 200}{21 / \sqrt{49}} = \frac{-6}{3} = -2.0\]

\[p = 2 \times P(Z \leq -2.0) = 2 \times 0.0228 = 0.0456\]

Since \(p = 0.0456 < \alpha = 0.05\), we reject \(H_0\).

There is statistically significant evidence that the caffeine content differs from the claimed 200mg.

P-Value Distribution: \(H_0\) vs \(H_1\)

3D View: P-Value as a Function of Z and Sample Size

The R Code Behind the Z-Test

x_bar <- 194
mu_0  <- 200
sigma <- 21
n     <- 49

z <- (x_bar - mu_0) / (sigma / sqrt(n))
z

p_value <- 2 * pnorm(-abs(z))
p_value

alpha <- 0.05
if (p_value < alpha) {
  cat("Reject H0: significant evidence against the claim.\n")
} else {
  cat("Fail to reject H0: insufficient evidence.\n")
}

Common Misconceptions

A p-value does NOT mean:

The probability that \(H_0\) is true
The chance the result happened by luck alone
That the effect is large or meaningful

The correct interpretation:

\[p = P(\text{observing data this extreme} \mid H_0 \text{ true}) \neq P(H_0 \text{ true} \mid \text{data})\]

Statistical significance \(\neq\) practical significance

Effect sizes and confidence intervals should always accompany p-values.

Summary

P-values measure evidence against \(H_0\), not in favor of \(H_1\)
Reject \(H_0\) when \(p < \alpha\); commonly \(\alpha = 0.05\)
Under a true \(H_0\), p-values are uniformly distributed
Under a true \(H_1\), p-values concentrate near zero
Always pair p-values with effect sizes for meaningful conclusions

\[\text{Rigorous inference} = \text{p-values} + \text{effect sizes} + \text{replication}\]