P-Value

What is a p-value?

The p-value is a measurement of how compatible an experiment’s results is with the null hypothesis (statement that there is no statistical relationship between a variable).

It is a probability, meaning it can take any value from 0 to 1. The lower the p-value, the stronger the evidence against the null hypothesis.

A small p-value indicates strong evidence against \(H_0\) (symbol for null hypothesis).

Calculating a p-value

Suppose our test statistic is:

\[ z_{obs} = 2 \]

For a two-sided test, the p-value is equal to:

\[ p = 2P(Z \ge 2) \]

Using the standard normal distribution:

\[ P(Z \ge 2) = 0.0228 \]

Therefore,

\[ p = 2(0.0228) = 0.0456 \]

Since \(p = 0.0456 < 0.05\), we reject \(H_0\).

Example using “Iris” dataframe in R

We will test:

\(H_0: \mu = 5.8\) (We are testing whether the true mean Sepal Length differs from 5.8 cm.)

Where:

\(\bar{x}\) = sample mean of Sepal.Length
\(s\) = sample standard deviation
\(n\) = sample size

Distribution of Sepal Length

ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram(aes(y = after_stat(density)),
                 bins = 15,
                 fill = "lightblue",
                 color = "black") +
  labs(title = "Distribution of Sepal Length",
       y = "Density")

Sample T-test

We must perform a one-sample t-test to test our hypothesis. We usually do not know the population standard deviation, so we use the t-distribution instead of the normal distribution.

Results:

## 
##  One Sample t-test
## 
## data:  iris$Sepal.Length
## t = 0.64092, df = 149, p-value = 0.5226
## alternative hypothesis: true mean is not equal to 5.8
## 95 percent confidence interval:
##  5.709732 5.976934
## sample estimates:
## mean of x 
##  5.843333

Test Statistic Visualization

How T-distribution changes with degrees of freedom

Summary

The formula:

\[ t = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} \]

gives us the observed test statistic.

The p-value formula:

\[ p = 2P(|T| \ge |t_{obs}|) \]

corresponds to the tail areas shown in the t-distribution plots.

Thus, the p-value is the mathematical probability represented visually by the tails of the distribution.