2026-03-06

P-Values

  • In hypothesis testing, scientists often aim to support their hypothesis by rejecting a “null hypothesis” using the data they collect.
  • The null hypothesis typically assumes there is no significant relationship between the variables being tested, while the alternative hypothesis assumes that there is a relationship.
  • A P-value, or probability value, measures the probability of obtaining the values in a given dataset assuming that a null hypothesis is true: \[P = \small{\text{Probability of observing results }}|{\text{ Null hypothesis is true}}\]
  • If that probability is below a significance cutoff (e.g. 0.05 or 0.01), then there is generally sufficient evidence to support the alternative hypothesis.

Statistical Tests

  • There is no one equation to calculate a p-value. The calculation depends on the statistical test being conducted.
  • Some common statistical tests include:
    • Chi-square for categorical data
    • T-test for comparing means across 2 groups with normally distributed data
    • ANOVA for comparing means across 3+ groups with normally distributed data

Usage Example

  • In this example, we will use a t-test, and the base R dataset ‘iris’.
  • For simplicity, we will compare petal length between two iris species: setosa and versicolor.
  • First, we can create filtered versions of the dataset with just our species of interest:
setosa = iris$Petal.Length[iris$Species == "setosa"]
versicolor = iris$Petal.Length[iris$Species == "versicolor"]

Iris Petal Length

  • A t-test asks us: are these two groups different?
  • In this case: do setosa and versicolor irises have different petal lengths?
  • In the graph below, we see that they have very different mean petal lengths. But can we prove it statistically?

Data Distribution

  • We can also view this using a histogram, and check if our data is normally distributed.

T-test

  • This is the t-test equation: \[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]
  • It calculates how many standard errors apart the two sample means are.
  • The null hypothesis is that the mean lengths are the same, while the alternative is that the mean lengths are different.

T-test on Iris Petal Length Data

  • We can run a t-test in R using the following function:
t.test(setosa, versicolor)
## 
##  Welch Two Sample t-test
## 
## data:  setosa and versicolor
## t = -39.493, df = 62.14, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.939618 -2.656382
## sample estimates:
## mean of x mean of y 
##     1.462     4.260

T-test on Iris Petal Length Data

  • The calculated p-value is 2.2e-16, or 0.0000000000000002, much lower than our cutoff of 0.05.
  • So, this p-value means that the difference in petal length is statistically significant, because the chance of getting the values we have if the null hypothesis was true is 0.00000000000002%.

Visualizing the T-test

  • We can graph the T-test values to visualize how our test works:

Visualizing the T-test

  • Based on the graph, a t-statistic of 0 means that there is no difference between the two petal length means, and anything falling within the red range would have a p-value < 0.05.
  • Our t-statistic is -39.49, far past our cutoff of -2, which is why the p-value is so low and our test has very strong statistical significance.

References