2025-03-25

1) Definition - What is p-value?

P-value stands for probability value, particularly the probability of obtaining the effect observed in a sample.

It is used in a variety of hypothesis testing, to determine statistical significance.

Some places that p-values are used in:

  • z-tests
  • t-tests
  • Chi-square distribution
  • ANOVA
  • regression analysis

2) Relationship to Hypothesis Testing

  • If P-value \(\leq \alpha\), then reject \(H_0\) (null hypothesis)
  • If P-value \(> \alpha\), then fail to reject \(H_0\) (alternative hypothesis)

where \(\alpha\) is the significance level
(a commonly used value is \(\alpha = 0.05\))

3) ggplot Demonstration

Here, the p-value is \(< \alpha\), so we reject \(H_0\).

4) Interpretation of p-value

The p-value is commonly misinterpreted; it is not an error rate!

Keep in mind:
The p-value is the probability of observing another sample statistic that is at least as extreme as the given sample statistic, assuming that the null hypothesis is true.

Example:
Say we are testing how well a medication works, and have a sample data with \(\mu_0\). We find a p-value of 0.04.

Incorrect interpretation: There is a 4% chance of error when rejecting \(H_0\), i.e. there is a 4% chance of a Type I error.
Correct interpretation: Assuming that the medication has no effect, you would still expect to obtain at least the sample effect \(\mu_0\) in 4% of other studies, due to random sampling error.

5) Plotly Linear Regression

Here we have an example with some fitted data for a sample of trees. But how do we know we know that the effect of girth on volume is significant? Using the p-value is one method.

6) Continuing example for simple linear regression:

Initially we set a significance level of \(\alpha = 0.05\).
We see that the p-value for Girth is \(< 2e-16\).

## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

7) Continuing example for simple linear regression:

\(H_0\): \(\beta_1 = 0\)
\(H_A\): \(\beta_1 \neq 0\)

Since p-value \(< \alpha\), we reject \(H_0\), so \(\beta_1 \neq 0\).

This confirms the linear correlation trend that we see in the scatter plot.

8) Confirmed by p-value method

treeGraph <- ggplot(data = trees, aes(x = Girth, y = Volume)) + geom_point(size = 1)
treeGraph