Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

Model Creation

We are creating a model where stopping distance is being predicted by car speed.

cars.lm <- lm(cars$dist ~ cars$speed)

Model Visualization

plot(cars$dist ~ cars$speed)
abline(cars.lm)

Model Evaluation

Based on an inspection of summary statistics of the residuals, the residuals are close to normally distributed. The Median is -2.272 which is close to zero. The magnitude of Q1 and Q3 are almost identical. The Max is larger than the Min by 50% but I think is still close enough.

The Standard Error for our Intercept is ~3X smaller than our Estimate while the Standard Error for Speed is ~10X smaller which we want to see. We don’t have to calculate these values by hand because they are shown for us as the ‘t value’.

We can reject the null hypothesis that the true coefficients are zero for both our intercept and speed coefficient at the 95% confidence interval.

Our Q1 and Q3 values are in fact smaller than our RSE which means our residual distribution may not be as normal as I earlier asserted.

We have 48 degrees fo freedom because we have 50 observations and 2 parameters.

The Multiple R-squared is the variation explained by our linear model divided by the total variation. We can see that 65% of the variation in stopping distance is explained by speed. The Adjusted R-squared is the R-squared with an adjustment for number of predictors, our model is small and so is our adjustment.

summary(cars.lm)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Residual Analysis

We do not see an evident pattern and the points are evenly dispersed along the X axis but not above and below zero.

plot(fitted(cars.lm), resid(cars.lm))
abline()

### Q-Q Plot

The quantile versus quantile plot give another check on the normality of our residuals. It they are normally distributed we expect them to plot in a straight line.

These residuals look normal except at the right tail (not crazy bad). The right tail is a little heavier than we would expect.

qqnorm(resid(cars.lm))
qqline(resid(cars.lm))

Lastly we can plot multiple checks at the same time. Including two new ones, the Scale-Location plot which another way to examine the residuals (standardized and squared) and finally the Residuals-Leverage plot which allows you to examine outliers.

These two new plots give us no new information. There are no outliers and our residuals don’t appear to violate normality.

par(mfrow=c(2,2))
plot(cars.lm)