P-VALUES

What are p-values?

The probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.

Where does it come from?

This probability can be based from different kinds of test statistics for example: \[z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\] \[t=\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\]

When do we use it?

The p value is used usually as evidence for or against the statistical significance of the null hypothesis during hypothesis tests. The lower the value the more evidence to reject the null hypothesis and viceversa.

The p value represents the shaded area of the in terms of probability

When running a linear regression the summary output will indicate it:

linmod<-lm(UrbanPop~Murder,data=USArrests) 
summary(linmod)

Call:
lm(formula = UrbanPop ~ Murder, data = USArrests)

Residuals:
    Min      1Q  Median      3Q     Max 
-32.248  -9.953   1.255  12.482  25.180 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  63.7393     4.2597  14.963   <2e-16 ***
Murder        0.2312     0.4785   0.483    0.631    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.59 on 48 degrees of freedom
Multiple R-squared:  0.00484,   Adjusted R-squared:  -0.01589 
F-statistic: 0.2335 on 1 and 48 DF,  p-value: 0.6312

To be Specific

In our previous slide in the rightmost column Pr(>|t|) is where we find the p-values for the simple linear regression model: \(Y_i=\beta_0+\beta_1X_i+\epsilon\) and we find that we have a p-value of 0.631. With such a high p-value we would fail to reject our null hypothesis

Plotting our linear model:

lm UrbanPop ~ Murder

And as the p-value suggested during our hypothesis test the correlation is minimal

Understanding the data better

Here is a plotly graph to see our murder rate by state and expectedly the highest rates are not in the most populated states which also explains the p-value we obtained from our tests.