2026-04-13

Intro Slide - What is a P-value?




What are P-values?


In a textbook way, a p value measures how likely your data is under the null hypothesis. Which is simple saying what is the possibility of seeing an extreme event during an experiment. Is this “extreme” result due to random chance, or is it unlikely enough that we should question the null hypothesis.

What is a Null Hypothesis?

For some background info, if you don’t know what a null hypothesis, it’s simply the assumption that nothing special is happening.


A high p value indicates that the results are compatible (or fail to reject) with the null hypothesis, meaning that the results in itself are likely just due to chance. A low p value however, indicates that the data is unlikely under the null hypothesis, meaning there is likely a true effect or difference happening that causes the event to happen.

Normal Distribution, extreme regions (ggplot)

> This graph shows the distribution of values we expect if the null hypothesis is indeed true, and the shaded red regions represent outcomes that are considered “extreme.” The p value is exactly this, which is the probability of landing in these extreme regions, meaning how likely our observed result is under the null hypothesis.

Example: MPG by Transmission

> Here, we are comparing the miles per gallon (MPG) between automatic and manual cars. The null hypothesis assumes there is no difference between the two groups. We will use a statistical test to determine if the observed difference is due to random chance using p-values.

P value of the last graph

## [1] 0.001373638

After calculating the p value from this graph, we can see that the result is 0.001. Generally a p value less than < 0.05 is unlikely under the null hypothesis. This value (0.001373638) is FAR below that threshold, which further proves our hypothesis that their is likely a signficant effect that exists within the graph

Plotly plot

This 3d graph shows the relationship between weight, horsepower, and quarter mile time. The colors represented how many cylinders a car has, which helps us recognize different patterns within the graph. How does this relate back to p values? Well, these patterns suggest that there may be differences among each group, and p values tell us whether those differences are statistically significant or are just simply due to random chance

Definition of a P-Value in LaTeX

\[ p = P(\text{observing data at least as extreme as } x \mid H_0 \text{ is true}) \]

A simple analogy..

\[ p = P(\text{70 heads or more } \mid \text{ Coin is fair}) \]

Imagine flipping a coin 100 times and testing to see if the coin is fair. You flip it 100 times and you get 70 heads. This equation represents how likely it is to observe 70 or more heads due to chance alone. If this probability is very small, it suggests the coin may not be fair.