The P-Value in Statistical Inference

What is a P-Value?

The p-value is one of the most widely used and misunderstood concepts in statistics.

Formal Definition:

The p-value is the probability of observing a test statistic as extreme or more extreme than the one actually observed, assuming the null hypothesis is true.

In plain terms: How surprising is our data if nothing interesting is happening?

A small p-value → data is unlikely under \(H_0\) → evidence against \(H_0\)
A large p-value → data is consistent with \(H_0\) → no strong evidence against \(H_0\)

Hypothesis Testing Framework

Every p-value lives inside a hypothesis test:

\(H_0\): Null hypothesis — the default assumption (e.g., “no effect”, “no difference”)
\(H_a\): Alternative hypothesis — what we want to detect

The logic:

Assume \(H_0\) is true
Compute a test statistic from the data
Ask: how often would we see a result this extreme by chance?
That probability is the p-value

Decision rule: Reject \(H_0\) if \(p\text{-value} < \alpha\), where \(\alpha\) is the chosen significance level (commonly \(\alpha = 0.05\)).

The Math Behind the P-Value

For a one-sample z-test comparing a sample mean \(\bar{x}\) to a known \(\mu_0\):

\[Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}\]

Under \(H_0\), the test statistic follows a standard normal distribution: \(Z \sim \mathcal{N}(0,1)\)

The p-values for each alternative are:

\[p = P(Z \geq z_{\text{obs}}) \quad \text{(right-tailed)}\] \[p = P(Z \leq z_{\text{obs}}) \quad \text{(left-tailed)}\] \[p = 2 \cdot P(Z \geq |z_{\text{obs}}|) \quad \text{(two-tailed)}\]

For a t-test (unknown \(\sigma\)), replace \(Z\) with \(T \sim t_{n-1}\).

Visualizing the P-Value

The shaded regions represent the p-value, the combined tail probability.

Example: Are Coins Fair?

Question: A coin is flipped 40 times and lands heads 28 times. Is the coin fair?

\(H_0: p = 0.5\) (fair coin) \(\quad\) \(H_a: p \neq 0.5\) (two-tailed)

\[\hat{p} = \frac{28}{40} = 0.70, \quad z = \frac{0.70 - 0.50}{\sqrt{0.5 \cdot 0.5 / 40}} \approx 2.53\]

\[p\text{-value} = 2 \cdot P(Z \geq 2.53) \approx 0.0114\]

Since \(p \approx 0.011 < 0.05 = \alpha\), we reject \(H_0\). There is statistically significant evidence the coin is not fair.

P-Value Across Significance Levels

For any \(\alpha\) to the right of the dashed line, we reject \(H_0\).

Simulating P-Values Under \(H_0\)

R Code: Running the Test

# One-proportion z-test: 28 heads in 40 flips
result <- prop.test(x = 28, n = 40, p = 0.5,
                    alternative = "two.sided",
                    correct = FALSE)
result

## 
##  1-sample proportions test without continuity correction
## 
## data:  28 out of 40, null probability 0.5
## X-squared = 6.4, df = 1, p-value = 0.01141
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.5456998 0.8192515
## sample estimates:
##   p 
## 0.7

Common Misconceptions

❌ Myth	✅ Truth
P-value = probability \(H_0\) is true	It assumes \(H_0\) is true; it is not \(P(H_0)\)
\(p < 0.05\) means a large effect	Statistical significance ≠ practical significance
\(p > 0.05\) proves \(H_0\)	Failing to reject is not proof \(H_0\) is true
Smaller p-value = better result	P-values depend heavily on sample size

Always report effect sizes and confidence intervals alongside p-values.

Summary

Concept	Key Point
Definition	\(P(\text{data this extreme} \mid H_0 \text{ true})\)
Small p-value	Evidence against \(H_0\)
Significance level \(\alpha\)	Threshold chosen before the test
Type I Error	Rejecting a true \(H_0\) — probability = \(\alpha\)
Type II Error	Failing to reject a false \(H_0\) — probability = \(\beta\)

Remember: The p-value is a tool not a verdict. Use it alongside effect sizes, confidence intervals, and domain knowledge for sound statistical inference.