data = read.csv("C:/Users/Will/OneDrive/Documents/School/375T Predictive Analytics/HW4 and HW5/Speed.csv")
attach(data)
fat = data$FatalityRate
state = data$StateControl
year = data$Year
model = lm(fat ~ year)
plot(model)
plot(year, fat)
lines(year, model$fit)
summary(model)
##
## Call:
## lm(formula = fat ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.18959 -0.07550 -0.02576 0.09346 0.24606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91.320887 8.374227 10.9 1.28e-09 ***
## year -0.044870 0.004193 -10.7 1.75e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1164 on 19 degrees of freedom
## Multiple R-squared: 0.8577, Adjusted R-squared: 0.8502
## F-statistic: 114.5 on 1 and 19 DF, p-value: 1.75e-09
The coefficients for the least squares line are (rounded within three decimal places) \[ \beta_0 = 91.321 \\ \beta_1 = -0.045\]
model2 = lm(fat ~ year + state + year*state)
plot(model2)
plot(year, fat)
lines(year, model2$fit)
Adding the new variable and the interaction term provided us with a more interpretable model. The errors of the new model appear randomly distributed without a discernable pattern.
The residuals of the plot are not normally distributed. Rather, the model underestimates the data in the years before 1990 and after 2002, and overestimates the data between those years. We can infer from the graph that the rate of fatalities declined at a faster rate prior to state control of speed limits in 1995. Our general least squares model is
\[ \hat{y} = \hat{\beta_0}+\hat{\beta_1}X_1\ +\ \hat{\beta_2}X_2\ +\ \hat{\beta_3}X_1X_2 \] where \[\hat{\beta_0} = 216.231 \\ \hat{\beta_1} = -0.176 \\ \hat{\beta_2} = -161.377 \\ \hat{\beta_3} = 0.081\]
However, we can separate the function into two functions, y1 and y2, as X2 = 0 prior to 1995 and X2 = 1 after 1995, so
\[ \hat{y_1} = \hat{\beta_0}+\hat{\beta_1}X_1\ \] has a slope of \[ \hat{\beta_1} = -0.176 \] and
\[ \hat{y_2} = (\hat{\beta_0} + \hat{\beta_2}) +(\hat{\beta_1}+\hat{\beta_3})X_1\ \] \[ \hat{y_2} = \tilde{\beta_0}+\tilde{\beta_1}X_1\ \] has a slope of \[ \hat{\beta_1}+\hat{\beta_3} = \tilde{\beta_1} = -0.095 \]