Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
We are asked to work with the “cars” dataset already in R. It can be accessed with the variable name, cars. There are 50 rows in the cars dataset. Below is the head of cars for example:
print(head(cars))
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
There is only two columns in cars (speed, dist).
Next, we can show a very general scatter plot:
plot(cars, main='Cars Speed vs. Distance')
The points of the scatter plot above indicate that there is a strong relationship between speed and dist as the points mostly represent the statement: higher the speed, longer the distance with only 1-2 out-liner points.
We can then create a linear model for the two variables’ relationship and plot that with the red line being the stable pattern of the plot:
plot(cars$speed~cars$dist, xlab='Speed', ylab='Distance',main='Cars Speed vs. Distance')
carslm=lm(cars$speed~cars$dist)
abline(carslm, col='red')
This scatterplot is a little different, but still provide the same statement with the red line showing the average point/data falls inline with it (higher the speed, longer the distance).
We can now look at the summary of carslm and the box plots for its residuals like in chapter 3:
summary(carslm)
##
## Call:
## lm(formula = cars$speed ~ cars$dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## cars$dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
par(ask=F)
par(mfrow=c(2,2))
plot(carslm)