What is a p-value?

A p-value is a probability used in hypothesis testing.

It measures how unusual the observed sample result would be if the null hypothesis were true.

A small p-value means the observed data are not very consistent with the null hypothesis.

Hypothesis testing setup

Most hypothesis tests begin with two competing statements:

\[ H_0: \text{the default claim or no-effect claim} \]

\[ H_a: \text{the alternative claim that we are trying to find evidence for} \]

The p-value is calculated under the assumption that \(H_0\) is true.

Mathematical definition

For a two-sided test statistic \(T\), the p-value can be written as:

\[ p\text{-value}=P\left(|T|\ge |t_{obs}|\mid H_0\text{ is true}\right) \]

For a one-sided right-tail test:

\[ p\text{-value}=P\left(T\ge t_{obs}\mid H_0\text{ is true}\right) \]

Visual idea with a normal curve

The shaded tail areas represent values at least as extreme as the observed statistic.

Example: filling bags of coffee

A machine is supposed to fill coffee bags with an average weight of 50 ounces.

A random sample of 36 bags has:

\[ \bar{x}=52.1, \quad s=4.8, \quad n=36 \]

We test:

\[ H_0:\mu=50 \qquad H_a:\mu\ne 50 \]

R code for the example

mu0 <- 50
sample_mean <- 52.1
sample_sd <- 4.8
n <- 36

t_stat <- (sample_mean - mu0) / (sample_sd / sqrt(n))
p_value <- 2 * pt(-abs(t_stat), df = n - 1)

round(c(t_stat = t_stat, p_value = p_value), 4)

Computed result

mu0 <- 50
sample_mean <- 52.1
sample_sd <- 4.8
n <- 36

t_stat <- (sample_mean - mu0) / (sample_sd / sqrt(n))
p_value <- 2 * pt(-abs(t_stat), df = n - 1)

round(data.frame(t_statistic = t_stat, p_value = p_value), 4)
##   t_statistic p_value
## 1       2.625  0.0128

If we use \(\alpha = 0.05\), the p-value is below 0.05, so we reject \(H_0\).

Simulated sample means

This ggplot shows a simulated sampling distribution under the null hypothesis \(\mu=50\).

Interactive plotly view

This 3D plot shows how two-sided p-values change with the absolute t-statistic and degrees of freedom.

Interpretation

The p-value is not the probability that \(H_0\) is true.

It is also not the probability that the result happened by random chance.

It is the probability of getting a result this extreme, or more extreme, assuming \(H_0\) is true.

Common mistakes

A p-value below 0.05 does not prove the alternative hypothesis.

A p-value above 0.05 does not prove the null hypothesis.

Statistical significance is not the same as practical importance.

The p-value should be interpreted with the study design, sample size, and subject-matter context.