Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

There is no need to load the cars data set because it is present in base R as a dataframe called “cars”. Per the text, the first step is to visualize the data. The \(x\) axis is the independent variable (speed) and the \(y\) axis is the dependent variable (stopping distance).

# Plot speed vs stopping distance for cars data set
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)")

# The plot appears to show that - as we would expect - 
# stopping distance increases as speed increases
# 
# Next, we generate a linear model and summarize it
cars.lm <- lm(dist ~ speed, data = cars)
summary(cars.lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
# Our model (from the coefficient and y-intercept):
# stopping distance = (e.9324 * speed) - 17.5791
# Plotting again with regression line
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)")
abline(cars.lm)

# Comments on the summary information:
# 1. The residuals distribution information suggests normality
# 2. The standard error for the speed coefficient is ~ 9.5
#    times the coefficient value, which is good
# 3. The p-value for the speed coefficient is very strong
# 4. The p-value for the y-intercept is also pretty strong
# 5. The multiple R-squared and adjusted R-squared values
#    are not alarming.
#
# Next, we plot residuals
plot(fitted(cars.lm), resid(cars.lm))

# Considering the size of the data set, the 
# residuals look fairly uniform
#
# Lastly, we complete a Q-Q plot
qqnorm(resid(cars.lm))
qqline(resid(cars.lm))

# The Q-Q plot suggests that there might be a 
# bit of skew, but not much