Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
The data set used here was “cars”. The data consist of the speed of a car in miles per hour and the stopping distance of the car in feet.
data("cars")
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
plot(cars$speed,
cars$dist,
xlab="Speed of Cars (mph)",
ylab="Distances Taken to Stop (ft)")
Summarize the linear regression model using the lm function. This function finds the line that most closely fits the measured data by minimizing the distances between the line and the individual data points.
model_car <- lm(cars$dist ~ cars$speed)
summary(model_car)
##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Plot the data as a line model
plot(cars$dist ~ cars$speed)
abline(model_car)
Checking Normality.
res <- resid(model_car) #alternatively, model$residuals
hist(res)
Checking constant variance of residuals.
plot(fitted(model_car),resid(model_car))
abline(h = 0, lty = 2)
Checking for nearly normal residuals with Q-Q plot.
qqnorm(resid(model_car))
qqline(resid(model_car))