\[ p\text{-value} = P(\text{observing results this extreme or more extreme under a model}) \]
A p-value measures how surprising the data is assuming some baseline expectation or model is true.
2025-11-16
\[ p\text{-value} = P(\text{observing results this extreme or more extreme under a model}) \]
A p-value measures how surprising the data is assuming some baseline expectation or model is true.
What a p-value is NOT:
- Not the probability the model is true
- Not the probability the data happened by chance
- Not a measure of effect size or importance
We simulate a sample from a model where the true mean is 5 and compute a p-value.
set.seed(42) n <- 40 x <- rnorm(n, mean = 5, sd = 1) pval_example <- t.test(x, mu = 5)$p.value pval_example
## [1] 0.8389828
x_df <- data.frame(x = x)
ggplot(x_df, aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 15, fill = "lightblue", color = "black") +
geom_density(color = "red", linewidth = 1) +
labs(title = "Simulated Data Distribution",
x = "Values", y = "Density")
We simulate 4000 p-values to understand their typical behavior when the model is true.
simulate_pvals <- function(n = 40, reps = 4000) {
pvals <- replicate(reps, {
x <- rnorm(n, mean = 5, sd = 1)
t.test(x, mu = 5)$p.value
})
data.frame(pval = pvals)
}
set.seed(123)
pvals_null <- simulate_pvals()
ggplot(pvals_null, aes(x = pval)) +
geom_histogram(binwidth = 0.05, boundary = 0,
color = "black", fill = "lightblue") +
labs(title = "Distribution of p-values\nwhen the model is true",
x = "p-value", y = "Count") +
xlim(0, 1)
mu_vals <- seq(4.5, 5.5, length.out = 30)
n_vals <- seq(20, 100, length.out = 30)
grid <- expand.grid(mu = mu_vals, n = n_vals)
grid$pval <- mapply(function(mu, n) {
xbar <- mu
z <- (xbar - 5) / (1/sqrt(n))
2 * (1 - pnorm(abs(z)))
}, grid$mu, grid$n)
plot_ly(grid,
x = ~mu, y = ~n, z = ~pval,
type = "surface",
showscale = TRUE) %>%
layout(
title = "How p-values Change with the Mean",
margin = list(l = 0, r = 0, b = 0, t = 40),
scene = list(
xaxis = list(title = "Mean μ"),
yaxis = list(title = "Sample Size n"),
zaxis = list(title = "p-value")
)
) %>%
config(displayModeBar = FALSE)