HW3

2025-11-16

1. What is a p-value?

\[ p\text{-value} = P(\text{observing results this extreme or more extreme under a model}) \]

A p-value measures how surprising the data is assuming some baseline expectation or model is true.

2. Interpreting a p-value

Small p-value: data looks unusual under the model
Large p-value: data looks typical under the model

What a p-value is NOT:
- Not the probability the model is true
- Not the probability the data happened by chance
- Not a measure of effect size or importance

3. Example: Computing a p-value (code)

We simulate a sample from a model where the true mean is 5 and compute a p-value.

set.seed(42)
n  <- 40
x <- rnorm(n, mean = 5, sd = 1)
pval_example <- t.test(x, mu = 5)$p.value
pval_example

## [1] 0.8389828

4. ggplot Visualization: Sample Data

x_df <- data.frame(x = x)

ggplot(x_df, aes(x = x)) +
  geom_histogram(aes(y = after_stat(density)),
                 bins = 15, fill = "lightblue", color = "black") +
  geom_density(color = "red", linewidth = 1) +
  labs(title = "Simulated Data Distribution",
       x = "Values", y = "Density")

5. Simulating Many p-values (code)

We simulate 4000 p-values to understand their typical behavior when the model is true.

simulate_pvals <- function(n = 40, reps = 4000) {
  pvals <- replicate(reps, {
    x <- rnorm(n, mean = 5, sd = 1)
    t.test(x, mu = 5)$p.value
  })
  data.frame(pval = pvals)
}

set.seed(123)
pvals_null <- simulate_pvals()

6. ggplot Visualization: Distribution of p-values

ggplot(pvals_null, aes(x = pval)) +
  geom_histogram(binwidth = 0.05, boundary = 0,
                 color = "black", fill = "lightblue") +
  labs(title = "Distribution of p-values\nwhen the model is true",
       x = "p-value", y = "Count") +
  xlim(0, 1)

7. 3D Plotly: How p-values Change with the Mean

mu_vals <- seq(4.5, 5.5, length.out = 30)
n_vals  <- seq(20, 100, length.out = 30)

grid <- expand.grid(mu = mu_vals, n = n_vals)

grid$pval <- mapply(function(mu, n) {
  xbar <- mu
  z <- (xbar - 5) / (1/sqrt(n))
  2 * (1 - pnorm(abs(z)))
}, grid$mu, grid$n)

plot_ly(grid,
        x = ~mu, y = ~n, z = ~pval,
        type = "surface",
        showscale = TRUE) %>%
  layout(
    title = "How p-values Change with the Mean",
    margin = list(l = 0, r = 0, b = 0, t = 40),
    scene = list(
      xaxis = list(title = "Mean μ"),
      yaxis = list(title = "Sample Size n"),
      zaxis = list(title = "p-value")
    )
  ) %>%
  config(displayModeBar = FALSE)

8. Summary

A p-value measures how unusual the data is under a model
It does not measure probability the model is true
Simulations show typical p-value behavior
Visualizations help understand how p-values change