John Miller
2025-11-16
The p-value is a probability that indicates the strength of evidence against the null hypothesis in a statistical test.
To understand the p-Value in an experiment, we will examine a famous and easy to understand experiment adapted from the famous “Lady Tasting Tea” experiment.
Before we get into that, it is important to lay out what exactly the p-Value is and how it is represented mathematically.
\[ p = Pr(T >= t | H_0) \]
What Does This Equation Mean?
\(H_0\) is our null hypothesis (the
assumption started with).
T is the test stats if the experiment is repeated multiple times.
t is the test statistic actually observed.
Pr is the probability.
Putting this all together the equation asks:
How surprising is the data if our original assumption is true?
-If p is small (p < 0.05) our null hypothesis is likely
false.
-If p is big (p >= 0.05) it seems our null hypothesis is likely
true.
A scientist wants to test the claim that a certain woman can detect
when milk has been poured into a coffee cup before the coffee and vice
versa just by tasting the coffee. The scientist fills four cups with
milk first and four with milk last. The woman will then drink out of
each cup and guess which cups are which for a total of eight cups.
After the experiment is over it is recorded that the woman guessed
correctly all eight cups of coffee.
This is represented as \[p = (0.5)^8 =
0.0039\]
Our p-value indicates the null hypothesis is incorrect. Meaning there
was a 0.4% chance of the woman just happening to get it right.
This first interactive plot takes a look at the probability distribution as a curve and indicates with the gold line how many guesses the woman was correct about, which was all eight.
These next two plots show the probability of guessing each number correctly.
The first plot shows the shape of the binomial distribution by connecting the points.
The Second plot simply shows each point in isloation.
library(plotly)
x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)
plot_ly(x = c(0:8), y = dbinom(0:8, size = 8, p = 0.5), type = "scatter", mode = "lines") %>%
layout(xaxis = list(title = "Number of Correct Guesses"), yaxis = list(title = "Probability"),
shapes = list(list(type = "line", x0 = 8, x1 = 8, y0 = 0, y1 = max(y), line = list(color = "gold", width = 2)))
)x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)
df = data.frame(Cguesses = x, Probability = y)
g <- ggplot(data = df, aes(x = Cguesses, y = Probability)) + geom_point() +
labs(x = "Number of Correct Guesses")x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)
df = data.frame(Cguesses = x, Probability = y)
g <- ggplot(data = df, aes(x = Cguesses, y = Probability)) + geom_point() +
labs(x = "Number of Correct Guesses")
g + geom_line() + coord_cartesian(ylim = c(0, max(y)))