P-value in Hypothesis Testing

2025-03-16

Introduction

A p-value is a statistical measure indicating the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true. It is crucial in determining whether we have enough evidence to reject the null hypothesis in statistical tests.

Mathematical Foundations

Consider a test statistic T (e.g., the number of heads in coin flips) under a null hypothesis H0. P-value is defined as:

\[ \text{p-value} = P\bigl(T \ge t_{\mathrm{obs}} \mid H_0\bigr) \]

The Binomial Setting

If X ~ Binomial(n,p) under H0, the p-value when observing x successes is:

\[ \text{p-value} = \sum_{k=x}^{n} \binom{n}{k} \, p^k \, (1 - p)^{\,n - k} \] This is the upper tail probability (assuming a one-tailed test).

Data Example

Below, we simulate coin-flip data for a fair coin (p=0.5). We flip the coin n=100 times, repeated 1000 times, and count how often we get ≥70 heads in a single run.

set.seed(123)
n <- 100
p <- 0.5

# Simulate 1000 experiments of 100 flips each
sim_data <- rbinom(1000, size=n, prob=p)

# Observed p-value for "70 or more heads" in each experiment
p_value_est <- mean(sim_data >= 70)

cat("P-value estimate for ≥70 heads in 100 flips:", p_value_est, "\n")

## P-value estimate for ≥70 heads in 100 flips: 0

Interpretation

This empirical p-value tells us how rare it is to observe at least 70 heads if the coin were fair.

2D Visualization with ggplot (Histogram) We can visualize the distribution of the number of heads across the 1000 simulated experiments.

library(ggplot2)

sim_df <- data.frame(Heads = sim_data)

ggplot(sim_df, aes(x = Heads)) +
  geom_histogram(binwidth = 1, fill = "blue", color = "black") +
  geom_vline(xintercept = 70, color = "red", linetype = "dashed") +
  ggtitle("Distribution of Heads (100 flips per experiment)") +
  xlab("Number of Heads") +
  ylab("Frequency")

Another ggplot

Bar Plot of Counts Above Threshold Let’s categorize each experiment into “70+ heads” or “< 70 heads” and visualize counts.

sim_df$Above70 <- ifelse(sim_df$Heads >= 70, "≥70 Heads", "<70 Heads")

ggplot(sim_df, aes(x = Above70)) +
  geom_bar(fill = "lightblue", color = "black") +
  ggtitle("Experiments with ≥70 Heads vs. <70 Heads") +
  xlab("") +
  ylab("Count of Experiments")

3D Visualization

Plotly To explore a 3D perspective, suppose we vary the probability of heads p from 0 to 1 (x-axis) and the “threshold” number of heads from 0 to 100 (y-axis). We then look at the binomial probability mass P(X=k) (z-axis).

library(plotly)

x_vals <- seq(0, 1, by = 0.05)
y_vals <- seq(0, 100, by = 5)
z_vals <- outer(x_vals, y_vals, function(prob, k) dbinom(k, size=n, prob=prob))

plot_ly(x = x_vals, y = y_vals, z = z_vals, type = "surface") %>%
  layout(title = "3D Binomial Distribution Surface",
         scene = list(xaxis = list(title = "Probability (p)"),
                      yaxis = list(title = "Heads (k)"),
                      zaxis = list(title = "P(X = k)")))

Another code

set.seed(123)
n <- 100
p <- 0.5
sim_data <- rbinom(1000, size = n, prob = p)
p_value_est <- mean(sim_data >= 70)
p_value_est

## [1] 0

This code provides the empirical probability of observing 70 or more heads.

Conclusion

P-value is a key concept in hypothesis testing to quantify how extreme observed results are under H0. We simulated a coin toss scenario, demonstrating how to estimate a p-value empirically. Visualizations (2D ggplot, 3D Plotly) help illustrate the underlying binomial distribution and significance of outcomes.

Takeaways:

A small p-value (<0.05) suggests rejecting H0.
A large p-value indicates insufficient evidence to reject H0.