library(ggplot2)
Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
carsdf <- datasets::cars
plot(carsdf$speed ~ carsdf$dist, xlab = 'Distance (feet)', ylab = 'Speed (mph)')
obs <- lm(carsdf$speed ~ carsdf$dist)
abline(obs)
summary(obs)
##
## Call:
## lm(formula = carsdf$speed ~ carsdf$dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## carsdf$dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The \(R^2\) value is not very high, so the formula doesn’t appear to be too strong. The model accounts for 64.38% of the data’s variation.
qqnorm(obs$residuals)
qqline(obs$residuals)
However, when we check the qqnorm plot, the residuals appear to be following a normal pattern, so the model appears to be fairly reliable.