The p-value is the probability of obtaining a test result at least as extreme as the observed result.
- Used in hypothesis testing
- Assumes the null hypothesis is true
- Used in experiments, machine learning, and data analysis
2026-03-08
The p-value is the probability of obtaining a test result at least as extreme as the observed result.
\[H_0\]
The null hypothesis: the default claim that there is no effect, relationship, or difference between the measured variables or populations
\[H_a\]
The alternative hypothesis: the counter claim that there is an effect, relationship, or difference between the measured variables or populations
Example:
When performing a t-test for means, we can compute the z-statistic:
\[ t = \frac{\bar{x} - μ_0} {s / \sqrt{n}} \]
Where x̄ is the sample mean, μ0 is the hypothesized population mean, s is the sample standard deviation, and n is the sample size.
This formula measure how far the sample mean is from the hypothesized mean. With the t-value, the p-value can be found using the standard normal distribution using a standard normal table by finding the corresponding p-value for the obtained t-score.
We set a value, alpha, in our test to compare our p-value to. Most often, it is 0.05 for a 95% confidence test. A p-value smaller than alpha means that the observed data would be unlikely if the null hypothesis were true.
\[ \alpha = 0.05 \] If the p-value is less than alpha, reject the null hypothesis.
Otherwise, fail to reject the null hypothesis.
We will perform an example t-test testing whether the mean of a sample differs from 20.
## ## One Sample t-test ## ## data: sample_data ## t = 7.8923, df = 99, p-value = 4.079e-12 ## alternative hypothesis: true mean is not equal to 20 ## 95 percent confidence interval: ## 22.65333 24.43555 ## sample estimates: ## mean of x ## 23.54444
The result of the test is a 95% confidence interval with α = 0.05 that does not contain the null hypothesis, H0: μ = 20. So we reject the null hypothesis, and the true mean is greater than 20.
The following is a histogram of the distribution of the sample.
The red line is the hypothesized mean H0: μ = 20
The following is the test statistic distribution, which is a normal distribution.
The red line is the observed test statistics, and the area to the beyond it is the p-value
This Plotly histogram allows interaction with the data