2025-03-25

Introduction to P-Value

  • The p-value is the probability of getting a result as, or more extreme than, the result obtained from the sample, assuming \(H_0\) is true
  • The p-value is just one approach to hypothesis testing in statistics to assess the strength of evidence against the null hypothesis

Hypothesis Testing

  • First, we assume a statement about a population is true (the null hypothesis - \(H_0\))
  • Then we want to determine if there is evidence to suggest that something else about the population is true (the alternative hypothesis - \(H_a\))
  • We use different tests based on what evidence we wish to find. Sample data is used in these tests to draw conclusions about the population
  • One-tail tests:
    • Left-tail:\(H_0: \mu = \mu_0 \quad \text{vs.} \quad H_a: \mu < \mu_0\)
    • Right-tail:\(H_0: \mu = \mu_0 \quad \text{vs.} \quad H_a: \mu > \mu_0\)
  • Two-tail test:\(H_0: \mu = \mu_0 \quad \text{vs.} \quad H_a: \mu \neq \mu_0\)

Significance Level

  • The significance level (denoted \(\alpha\)) is the probability of rejecting the null hypothesis when it is actually true
    • For example, at a significance level of 0.05, there is a 5% chance of incorrectly rejecting the null hypothesis
  • Common significance levels include 0.05, 0.01, 0.005

Rejection Region

  • In a sample distribution curve, \(\alpha\) represents the area under the curve which makes up the rejection region
  • If the test statistic falls within the rejection region, we reject the null hypothesis

What does the P-Value Mean?

  • The p-value is the area under the curve corresponding to the test statistic
  • The test statistic (z-score) for the z-test is given by \(Z = \frac{X - \mu}{\frac{\sigma}{\sqrt{n}}}\), where \(X\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation, and \(n\) is the sample size
  • If the p-value is smaller than the significance level, the null hypothesis (\(H_0\)) is rejected
  • If \(H_0\) is rejected, then there is sufficient evidence to suggest that the population is different in some way (as specified in the alternative hypothesis)

Visualizing Right-Tail P-Value on a Normal Distribution Curve (R code)

library(ggplot2)

# Data for the normal distribution
x_vals <- seq(-4, 4, length.out = 100)
y_vals <- dnorm(x_vals)

# Define critical value for p-value visualization
critical_value <- 2

# Create the first ggplot for the normal distribution with shaded p-value region
p_value <- ggplot(data = data.frame(x = x_vals, y = y_vals), aes(x = x, y = y)) +
  geom_line(color = "#355c7d", size = 1) +  # Normal distribution line
  geom_area(data = subset(data.frame(x = x_vals, y = y_vals), x >= critical_value), 
            aes(x = x, y = y), fill = "#f67280", alpha = 0.5) +  # Shaded p-value area
  labs(title = "Normal Distribution with P-value Region",
       x = "Test Statistic (z)", y = "Density") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Visualizing Right-Tail P-Value on a Normal Distribution Curve

Visualizing Two-Tail Rejection Region on a Normal Distribution Curve

One-Tailed Test Example (Right-Tail)

  • As you can see, the rejection region is larger than the p-value, so we reject \(H_0\)

Two-Tailed Test Example

  • As with the last example, you can also see the z-values that encompass the regions

Conclusion

  • The p-value helps us assess the strength of evidence against the null hypothesis (\(H_0\))
  • A small p-value (<\(\alpha\)) indicates strong evidence to reject \(H_0\)
  • The rejection region represents the area under the curve where test statistic values are extreme enough to reject \(H_0\)
  • The significance level (\(\alpha\)) determines the size of the rejection region