2025-10-26

Overview

  • Goal: Explain what a p-value is and how to understand it in simple terms.
  • We’ll cover:
    • What a p-value means and where it comes from.
    • A quick example using simulated data.
    • A histogram showing how p-values behaves from repeated experiments.
    • One interactive plotly example to visualize it better.
    • One interactive scatterplot with both code and visual.
    • Main takeaways about statistical significance.

What is a p-value?

  • A p-value shows how extreme the observed data are when assuming the null hypothesis \(H_0\) is true.
  • Smaller p-values mean the data fit less with \(H_0\).
  • Usually, if the p-value is below 0.05, we say the result is statistically significant and reject \(H_0\).
  • In simple terms — low p-value = strong evidence that something real is going on.

Mathematical Definition

For a test statistic \(T(X)\) and an observed value \(t_{obs}\),

\[ p = P(T(X) \ge t_{obs} \mid H_0) \]

If this probability is small, the observed result would be rare if \(H_0\) were true.

In simple terms:
The p-value tells us how likely our data is if the null hypothesis were actually true.
- A small p-value (like < 0.05) means our result is unlikely by random chance.
- A large p-value means our data fits what we’d expect if the null hypothesis holds.

Example 1 – Simulating p-values

Example 2 – Code for Interactive Scatterplot

library(plotly)
set.seed(2)
x <- rnorm(50)
y <- x + rnorm(50)
plot_ly(x = ~x, y = ~y, type = "scatter", mode = "markers",
        marker = list(size = 10, color = 'darkgreen'), height = 400) %>%
  layout(title = "Interactive Scatterplot of Simulated Data",
         xaxis = list(title = "X values"),
         yaxis = list(title = "Y values"))

Example 2 – Scatterplot visual

Thank you!

## Presentation created by Talall Alabadi using R Markdown