Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
First we will load the built in dataset
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
Next, the data is plotted, the plot shows that as the stopping distance increases, speed increases too.
plot(cars, xlab = "Speed", ylab = "Stopping distance")
Now, let’s construct a linear model based on a single-factor regression.
In this model, the independent variable (input) is speed, while the
dependent variable (output) is stopping distance.
The model equation takes the form: stopping distance = -17.5791 + 3.9324 * speed.
We will utilize the lm() function in R to perform the
regression analysis, and then summarize the results to gain insights
into the relationship between speed and stopping distance.
cars.lm <- lm(dist ~ speed, data = cars)
summary(cars.lm)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
After running the linear model
Plotting the linear model
plot(cars, xlab = "Speed", ylab = "Stopping distance")
abline(cars.lm)
Plotting the residuals The plot illustrates that the residuals are evenly distributed around zero, showing a uniform scattering both above and below the zero line.
plot(fitted(cars.lm), resid(cars.lm))
Plotting Normal QQ plot The plot reveals some skewness to the right, and the points deviating from the straight line suggest that the data might not be entirely normally distributed.
qqnorm(resid(cars.lm))
qqline(resid(cars.lm))