Understanding the p-value

2026-03-08

What is the p-value?

The p-value is the probability of obtaining a test result at least as extreme as the observed result.

Used in hypothesis testing
Assumes the null hypothesis is true
Used in experiments, machine learning, and data analysis

What is a Hypothesis Test

\[H_0\]

The null hypothesis: the default claim that there is no effect, relationship, or difference between the measured variables or populations

\[H_a\]

The alternative hypothesis: the counter claim that there is an effect, relationship, or difference between the measured variables or populations

Example:

H₀: μ = 20 (the mean of the data is equal to 20)
H_a: μ > 20 (the mean of the data is greater than 20)

p-value using the t-test

When performing a t-test for means, we can compute the z-statistic:

\[ t = \frac{\bar{x} - μ_0} {s / \sqrt{n}} \]

Where x̄ is the sample mean, μ₀ is the hypothesized population mean, s is the sample standard deviation, and n is the sample size.

This formula measure how far the sample mean is from the hypothesized mean. With the t-value, the p-value can be found using the standard normal distribution using a standard normal table by finding the corresponding p-value for the obtained t-score.

Decision Rule

We set a value, alpha, in our test to compare our p-value to. Most often, it is 0.05 for a 95% confidence test. A p-value smaller than alpha means that the observed data would be unlikely if the null hypothesis were true.

\[ \alpha = 0.05 \] If the p-value is less than alpha, reject the null hypothesis.

Otherwise, fail to reject the null hypothesis.

We never accept the null hypothesis because it may still be false. We can only say that it is unlikely to be true (reject) or that it is not unlikely (fail to reject).

Example t-test

We will perform an example t-test testing whether the mean of a sample differs from 20.

## 
##  One Sample t-test
## 
## data:  sample_data
## t = 7.8923, df = 99, p-value = 4.079e-12
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  22.65333 24.43555
## sample estimates:
## mean of x 
##  23.54444

The result of the test is a 95% confidence interval with α = 0.05 that does not contain the null hypothesis, H₀: μ = 20. So we reject the null hypothesis, and the true mean is greater than 20.

Graph of the Sample

The following is a histogram of the distribution of the sample.

The red line is the hypothesized mean H₀: μ = 20

Test Statistic Distribution

The following is the test statistic distribution, which is a normal distribution.

The red line is the observed test statistics, and the area to the beyond it is the p-value

Interactive Plotly Histogram

This Plotly histogram allows interaction with the data