Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)


Preliminary look at the car data set and analysis.


head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
tail(cars)
##    speed dist
## 45    23   54
## 46    24   70
## 47    24   92
## 48    24   93
## 49    24  120
## 50    25   85
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
plot(cars$speed,cars$dist, xlab='Car Speed', ylab='Stopping Distance')


Linear Model.


carslm <- lm(cars$dist ~ cars$speed)
plot(cars$speed, cars$dist, xlab='Car Speed', ylab='Stopping Distance')
abline(carslm)

summary(carslm)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12


Initial view of the plot shows a possible relationship between car speed and stopping distance. The higher the speed the longer the distance it stops.


The summary shows a mean value that is close to zero. The Standard Error speed coefficient in comparision to the coefficient value is about 9 times smaller which within the range of a good model. Both R squared values indicate that 65% of the data variation can be explained. The higher that it the better. And lastly, the P-Value is very low which possibly indicates that the speed variable has an influence on the stopping distane.


Residuals


plot(fitted(carslm),resid(carslm)) 
abline(0,0)

qqnorm(resid(carslm))
qqline(resid(carslm))


Based on what the plot is showing and watching the JBSTATISTICS Video Series, the residuals do not show any curving, or any other severe patterns. Also there is no larger sections or smaller sections of variability, they appear to be evenly distributed. A check mark will be given stating the residuals show nothing to indicate that the assumptions of the model are not true. The model appear to be good.