Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

Solution:

Linear Model

linear_model <- lm(dist ~ speed, data = cars)
linear_model
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

This is the linear model for stopping distance as a function of speed. The output shows the coefficient values.

Visualization

plot(cars$dist, cars$speed)

The plot shows the relationship between stopping distance and speed. Based on the graph, the higher the speed, the longer the stopping distance.

Quality Evaluation of the Model

summary(linear_model)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Here we can see the residuals of the linear model created. We can also see the Residual standard error and the Multiple and Adjusted r-squared. Based on the Multiple R-squared and the Adjusted R-squared, the linear model is appropriate because they have high values, 0.6511 for the Multiple R-squared and 0.6438 for the Adjusted R-squared.

Residual Analysis

plot(fitted(linear_model), resid(linear_model))

qqnorm(resid(linear_model))
qqline(resid(linear_model))

Based on the residual analysis, we further see that the linear model is appropriate. This is because most of the points fall on the straight line. This means that the residuals almost follow a normally distribution.