2025-03-15

Introduction

  • The P-Value is an essential concept within statistics and testing hypothesis.
  • It is a number that helps determine if there is enough evidence to reject a null hypothesis
  • The P-Value is typically set at 0.05, with a p-value below 0.05 suggesting strong evidence against the null hypothesis, while larger than 0.05 provides weak evidence against the null hypothesis.
  • It does not measure the probability that the null hypothesis is true, but instead the probability of obtaining a test statistic at least as extreme as the one we observed.

Interpretating p-Values

  • Low p-value (\(p < 0.05\)): Reject \(H_0\), evidence against the null hypothesis.
  • High p-value (\(p \geq 0.05\)): Fail to reject \(H_0\), weak evidence against the null hypothesis. Explanation: If the p-value is low, we reject the null hypothesis, suggesting strong evidence against it. If the p-value is high, we fail to reject the null hypothesis, suggesting weak evidence against it.

Tests

  • I’ll be using the data set “Oranges” in R to show an example of utilizing P-Value
data("Orange")
summary(Orange)
##  Tree       age         circumference  
##  3:7   Min.   : 118.0   Min.   : 30.0  
##  1:7   1st Qu.: 484.0   1st Qu.: 65.5  
##  5:7   Median :1004.0   Median :115.0  
##  2:7   Mean   : 922.1   Mean   :115.9  
##  4:7   3rd Qu.:1372.0   3rd Qu.:161.5  
##        Max.   :1582.0   Max.   :214.0
  • Since the median age of the trees is 1004, I’ll be using 1000 years old as a cut off between “young” and “old” trees.

Tests

young_trees = Orange$circumference[Orange$age < 1000]
old_trees = Orange$circumference[Orange$age >= 1000]
t_test = t.test(young_trees, old_trees)
t_test
## 
##  Welch Two Sample t-test
## 
## data:  young_trees and old_trees
## t = -9.245, df = 32.451, p-value = 1.306e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -117.85171  -75.31496
## sample estimates:
## mean of x mean of y 
##  60.66667 157.25000

Hypothesis

  • Null Hypothesis: The mean circumference is the same for young and old trees
  • Alternate Hypothesis: The mean circumference is different between young and old trees

GGPlot 1

GGPlot 2

Plotly

Our Conclusion

  • In this example, we performed a two-sample t-test in order to compare the mean circumference of trees based on age. The null hypothesis was tested against the alternative hypothesis.

  • The p-value was \(p = 1.306 \times 10^{-10}\).

Since (\(p < 0.05\)), we reject the null hypothesis, meaning that the mean circumferences of young and old trees are significantly different.