Using R, build a regression model for data that interests you. Conduct residual analysis.
data("USArrests")
# Fit the linear regression model with Murder as the response variable
model <- lm(Murder ~ Assault + UrbanPop + Rape, data=USArrests)
summary(model)
##
## Call:
## lm(formula = Murder ~ Assault + UrbanPop + Rape, data = USArrests)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3990 -1.9127 -0.3444 1.2557 7.4279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.276639 1.737997 1.885 0.0657 .
## Assault 0.039777 0.005912 6.729 2.33e-08 ***
## UrbanPop -0.054694 0.027880 -1.962 0.0559 .
## Rape 0.061399 0.055740 1.102 0.2764
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.574 on 46 degrees of freedom
## Multiple R-squared: 0.6721, Adjusted R-squared: 0.6507
## F-statistic: 31.42 on 3 and 46 DF, p-value: 3.322e-11
par(mfrow=c(2, 2))
plot(model)
Was the linear model appropriate? Why or why not?
In conclusion, the linear model may not be the best option due to signs of non-linearity, potential outliers, and heteroscedasticity.