DATA605 Homework 11

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

Build the linear model:

library(datasets)
SD.Speed.lm <- lm(cars$dist ~ cars$speed)

Visualization

#visualization
plot(cars, main = "Cars Data", xlab = "Speed (mph)", ylab = "Stopping distance (ft)")
abline(SD.Speed.lm)

Quality evaluation of the model

#quality evaluation of the model
summary(SD.Speed.lm)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Least squares regression line for the linear model: \[ \hat{y}(Stopping Distance) = -17.5791 + 3.9324 * Speed \]

\[ R^2 = 0.6511 \]

Our model has a median value slightly below zero, with minimum and maximum values slightly out of balance, but first and third quartile values of nearly the same magnitude.

Our standard error for speed of 3.9324 is 9.46 times smaller than our corresponsing coefficient of 0.4155.

The probability that speed is not relevant in this model is \(1.49 * 10^{-16}\), which is incredibly small. The probability that the intercept is not relevant to the model is 0.0123, or about 1%.

The \(R^2\) value of 0.6511 means that the model describes 65.11% of the variation in the data.

Residual Analysis

plot(fitted(SD.Speed.lm), resid(SD.Speed.lm))

qqnorm(resid(SD.Speed.lm))
qqline(resid(SD.Speed.lm))

The residual values increase slightly as we move to the right, but they seem somewhat uniformly scattered above and below zero. Overall, this plot tells us that this model is producing good predictions.

Our residuals are roughly normally distributed, with the exception of a few strays at the upper tail. It’s not perfect, but it’s certainly not a bad start as far as a model goes for predicting these values.