R Markdown

For this log, I am using the USArrests dataset. I will look at the relationship between a state’s urban population percent and assault arrests on murder arrests.

data("USArrests")
head(USArrests)
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
## Colorado      7.9     204       78 38.7
summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00
logmod <- lm(Murder ~ UrbanPop * Assault, data = USArrests)
summary(logmod)
## 
## Call:
## lm(formula = Murder ~ UrbanPop * Assault, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4341 -1.7328 -0.3644  1.3557  7.4457 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.7718871  3.4131359  -0.226 0.822085    
## UrbanPop          0.0208443  0.0549756   0.379 0.706317    
## Assault           0.0676569  0.0181509   3.727 0.000528 ***
## UrbanPop:Assault -0.0003792  0.0002806  -1.351 0.183228    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.558 on 46 degrees of freedom
## Multiple R-squared:  0.6763, Adjusted R-squared:  0.6552 
## F-statistic: 32.03 on 3 and 46 DF,  p-value: 2.477e-11

From this summary, we can see that our β0 = -0.7719, our β1 = 0.02084, and our β2 = 0.06765. Our β1 variable means that for every 1 percent increase in urban population the number of murder arrests per 100,000 residents decreases by 0.02. Our β2 means that for every 1 increase in assault arrests means that muder arrests will increase by 0.0676. I believe these nummbers are so low because there is not a relationship between them.

The null hypothesis is

Ho: That murder arrests do not have a linear relationship with any of the predictors, meaning all betas would equal 0.

Ha: That murder arrests have a linear relationship with any of the predictors, meaning one beta does not equal 0.

In the summary you can find each t value for each predicator. In our model, the only pvalue that is lower than the alpha at 0.05 is the assault predictor at 0.000528. That means that there is only relationship between assault arrests and murder arrests. There is not enough evidence with a p-value of 0.706 that there is a realtionship between urban population and murder arrests.

Confidence and Prediction Intervals

We can create confidence intervals for Murder arrests by using a UrbanPop value and Assault value.

newdata <- data.frame(UrbanPop = 80, Assault = 120)
confy <- predict(logmod, newdata, interval = "confidence")
confy
##        fit      lwr      upr
## 1 5.374637 3.969547 6.779727

Above, we produced a confidence interval for murder arrests with a urban population percent of 80 and assault arrests of 120 per 100,00 residents. The confidence interval is (3.97, 6.78) meaning that we are 95% confident that average murder arrest levels for states with an assault arrests of 120 and urban pop of 80% is between 3.97 and 6.78.

We can also create a prediction interval:

predy <- predict(logmod, newdata, interval = "prediction")
predy
##        fit        lwr      upr
## 1 5.374637 0.03792419 10.71135

This prediction interval is wider since we used an individual y value instead of the mean value of y. This is because the variance for a single value is bigger than the variance for the mean of all values.