2025-11-15

Slide 1 — What is a p-value?

A p-value measures how unlikely your data would be if the null hypothesis were true.
This presentation explains what a p-value means, includes examples, and provides visuals created in R.

Slide 2 — Formal Definition (LaTeX)

For a one-sided hypothesis test:

\[ H_0: \mu = \mu_0 \qquad H_A: \mu > \mu_0 \]

The test statistic for a one-sample t-test is:

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

The p-value is:

\[ p = P(T \ge t_{\text{obs}} \mid H_0) \]

Slide 3 — R Code (Data + Test)

knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(plotly)
library(dplyr)
set.seed(2025)
n <- 20
mu0 <- 5
x <- rnorm(n, mean = 5.8, sd = 1.2)
t_test <- t.test(x, mu = mu0, alternative = "greater")
t_test
## 
##  One Sample t-test
## 
## data:  x
## t = 4.3012, df = 19, p-value = 0.0001927
## alternative hypothesis: true mean is greater than 5
## 95 percent confidence interval:
##  5.626641      Inf
## sample estimates:
## mean of x 
##  6.047919

Slide 4 — ggplot: Null Distribution (Shaded p-value)

df <- n - 1
t_obs <- t_test$statistic

tvals <- seq(-4, 4, length.out = 400)
dens <- dt(tvals, df)
df_plot <- data.frame(t = tvals, density = dens)

ggplot(df_plot, aes(t, density)) +
  geom_line() +
  geom_area(data = subset(df_plot, t >= t_obs),
            aes(t, density), alpha = 0.35) +
  geom_vline(xintercept = t_obs, linetype="dashed") +
  labs(title="Null t-distribution",
       subtitle=paste("Observed t =", round(t_obs,3)),
       x="t", y="Density")

Slide 5 — ggplot: Sample Data + Mean

dfx <- data.frame(x = x, index = 1:length(x))

ggplot(dfx, aes(index, x)) +
  geom_point(size=2, alpha=0.8) +
  geom_hline(yintercept = mean(x), color="maroon", linewidth = 1) +
  labs(title = "Sample Observations with Mean Line")

Slide 6 — 3D Surface (Plotly)

mu_hat_seq <- seq(4.5, 7.5, length.out = 60)
n_seq <- 5:60
sd_pop <- 1.2
mu0 <- 5

grid <- expand.grid(mu_hat = mu_hat_seq, n = n_seq) %>%
  mutate(se = sd_pop / sqrt(n),
         t = (mu_hat - mu0)/se,
         p = 1 - pt(t, df=n-1))

zmat <- matrix(grid$p, nrow=length(n_seq), ncol=length(mu_hat_seq), byrow=TRUE)

plot_ly(x=mu_hat_seq, y=n_seq, z=zmat) %>%
  add_surface() %>%
  layout(title="p-value Surface",
         scene=list(xaxis=list(title="Mean"),
                    yaxis=list(title="n"),
                    zaxis=list(title="p-value")))

Slide 7 — Interpreting p-values (LaTeX)

  • A small p-value means your sample would be unlikely under \(H_0\).

  • It measures the probability that \(H_0\) is true.

  • Interpretation formula:

\[ \text{p-value} = P(\text{data as or more extreme} \mid H_0) \]

Slide 8 — Conclusion

From our sample:

  • Sample mean: 6.048
  • Observed t: 4.301
  • One-sided p-value: 1.93^{-4}

If \(p < 0.05\), we reject \(H_0\).
Does this result reject the null? We reject the null hypothesis.

Slide 9 — Key Takeaways

  • p-values measure compatibility with the null hypothesis.
  • Larger samples tend to produce smaller p-values for the same effect.
  • Always report:
    • effect size
    • confidence intervals
    • test statistic
    • p-value

Thank you for your time!