What is the p-Value in Statistics?

The p-value is a probability that indicates the strength of evidence against the null hypothesis in a statistical test.

To understand the p-Value in an experiment, we will examine a famous and easy to understand experiment adapted from the famous “Lady Tasting Tea” experiment.

Before we get into that, it is important to lay out what exactly the p-Value is and how it is represented mathematically.

Mathematic Definition of a One-Sided Right-Tail Test-Static Distribution

\[ p = Pr(T >= t | H_0) \]

What Does This Equation Mean?

\(H_0\) is our null hypothesis (the assumption started with).
T is the test stats if the experiment is repeated multiple times.
t is the test statistic actually observed.
Pr is the probability.

Putting this all together the equation asks:

How surprising is the data if our original assumption is true?

-If p is small (p < 0.05) our null hypothesis is likely false.
-If p is big (p >= 0.05) it seems our null hypothesis is likely true.

Example of the p-Value in an Experiment

A scientist wants to test the claim that a certain woman can detect when milk has been poured into a coffee cup before the coffee and vice versa just by tasting the coffee. The scientist fills four cups with milk first and four with milk last. The woman will then drink out of each cup and guess which cups are which for a total of eight cups.
After the experiment is over it is recorded that the woman guessed correctly all eight cups of coffee.

This is represented as \[p = (0.5)^8 = 0.0039\]
Our p-value indicates the null hypothesis is incorrect. Meaning there was a 0.4% chance of the woman just happening to get it right.

Let’s Take a Look at this as Plots

This first interactive plot takes a look at the probability distribution as a curve and indicates with the gold line how many guesses the woman was correct about, which was all eight.

More Plots

These next two plots show the probability of guessing each number correctly.

The first plot shows the shape of the binomial distribution by connecting the points.

The Second plot simply shows each point in isloation.

Code Behind All Three Plots

library(plotly)

x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)

plot_ly(x = c(0:8), y = dbinom(0:8, size = 8, p = 0.5), type = "scatter", mode = "lines") %>%
layout(xaxis = list(title = "Number of Correct Guesses"), yaxis = list(title = "Probability"),

shapes = list(list(type = "line", x0 = 8, x1 = 8, y0 = 0, y1 = max(y), line = list(color = "gold", width = 2)))

)

x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)

df = data.frame(Cguesses = x, Probability = y)

g <- ggplot(data = df, aes(x = Cguesses, y = Probability)) + geom_point() +
labs(x = "Number of Correct Guesses")

x = 0:8
y = dbinom(0:8, size = 8, p = 0.5)

df = data.frame(Cguesses = x, Probability = y)

g <- ggplot(data = df, aes(x = Cguesses, y = Probability)) + geom_point() +
labs(x = "Number of Correct Guesses")

g + geom_line() + coord_cartesian(ylim = c(0, max(y)))