df_uk <- data.frame(datasets::Seatbelts)
summary(df_uk)
## DriversKilled drivers front rear
## Min. : 60.0 Min. :1057 Min. : 426.0 Min. :224.0
## 1st Qu.:104.8 1st Qu.:1462 1st Qu.: 715.5 1st Qu.:344.8
## Median :118.5 Median :1631 Median : 828.5 Median :401.5
## Mean :122.8 Mean :1670 Mean : 837.2 Mean :401.2
## 3rd Qu.:138.0 3rd Qu.:1851 3rd Qu.: 950.8 3rd Qu.:456.2
## Max. :198.0 Max. :2654 Max. :1299.0 Max. :646.0
## kms PetrolPrice VanKilled law
## Min. : 7685 Min. :0.08118 Min. : 2.000 Min. :0.0000
## 1st Qu.:12685 1st Qu.:0.09258 1st Qu.: 6.000 1st Qu.:0.0000
## Median :14987 Median :0.10448 Median : 8.000 Median :0.0000
## Mean :14994 Mean :0.10362 Mean : 9.057 Mean :0.1198
## 3rd Qu.:17202 3rd Qu.:0.11406 3rd Qu.:12.000 3rd Qu.:0.0000
## Max. :21626 Max. :0.13303 Max. :17.000 Max. :1.0000
glimpse(df_uk)
## Rows: 192
## Columns: 8
## $ DriversKilled <dbl> 107, 97, 102, 87, 119, 106, 110, 106, 107, 134, 147, 180…
## $ drivers <dbl> 1687, 1508, 1507, 1385, 1632, 1511, 1559, 1630, 1579, 16…
## $ front <dbl> 867, 825, 806, 814, 991, 945, 1004, 1091, 958, 850, 1109…
## $ rear <dbl> 269, 265, 319, 407, 454, 427, 522, 536, 405, 437, 434, 4…
## $ kms <dbl> 9059, 7685, 9963, 10955, 11823, 12391, 13460, 14055, 121…
## $ PetrolPrice <dbl> 0.10297181, 0.10236300, 0.10206249, 0.10087330, 0.101019…
## $ VanKilled <dbl> 12, 6, 12, 8, 10, 13, 11, 6, 10, 16, 13, 14, 14, 6, 8, 1…
## $ law <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
You can also embed plots, for example:
plot(df_uk$PetrolPrice, df_uk$DriversKilled)
model <- lm(DriversKilled ~ PetrolPrice + I(PetrolPrice^2) + law + PetrolPrice*law, data=df_uk)
summary(model)
##
## Call:
## lm(formula = DriversKilled ~ PetrolPrice + I(PetrolPrice^2) +
## law + PetrolPrice * law, data = df_uk)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.264 -16.141 -4.538 13.755 61.445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 382.08 122.01 3.131 0.00202 **
## PetrolPrice -4423.33 2398.92 -1.844 0.06678 .
## I(PetrolPrice^2) 18480.39 11679.10 1.582 0.11526
## law 63.24 292.79 0.216 0.82922
## PetrolPrice:law -692.24 2514.91 -0.275 0.78342
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.98 on 187 degrees of freedom
## Multiple R-squared: 0.1974, Adjusted R-squared: 0.1802
## F-statistic: 11.5 on 4 and 187 DF, p-value: 2.29e-08
PetrolPrice is close to being significant at a 95% confidence level and is once the quadratic term is removed (shown below). The dichotomous variable “law” indicating when a law requiring seat belts be worn was in effect is surprisingly not significant, but again once the quadratic and interaction terms are removed it becomes significant (shown below). The interaction between PetrolPrice and law is also not significant.
model2 <- lm(DriversKilled ~ PetrolPrice + law, data=df_uk)
summary(model2)
##
## Call:
## lm(formula = DriversKilled ~ PetrolPrice + law, data = df_uk)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.805 -17.280 -5.101 14.178 62.703
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 190.591 15.236 12.509 < 2e-16 ***
## PetrolPrice -635.306 148.549 -4.277 3.01e-05 ***
## law -16.326 5.556 -2.939 0.00371 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.01 on 189 degrees of freedom
## Multiple R-squared: 0.1866, Adjusted R-squared: 0.178
## F-statistic: 21.68 on 2 and 189 DF, p-value: 3.329e-09
plot(model2)
Nothing too alarming is showing in the residual plots. Residuals vs Fitted shows there does not seem to be heteroskedasticity. The Q-Q plot shows the data is approximately normal. Residuals vs Leverage is showing no high leverage outliers.
Conclusion: Linear model was appropriate based on the constant negative correlation seen in the scatterplot and the p-values showing significance to support it.