- Using the cars dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis).
EDA
Let’s start our analysis with a little exploratory data analysis to ensure we understand the data set. We’ll the use the str and summary functions for starts.
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Now let’s plot our variables: speed(explanatory) and and stopping distance (response).

Model Building and Evaluation
Both the intercept and coefficient are significant, thought the coefficient is more so. The R-squared is a very respectable 65.11% and the adjusted R-Squared is 64.38%. The F-statistics was 89.57.
The QQ plots indicate some skew, hovever, overall they reflect a near-normal distribution for the residuals.

Conclusion
The model’s coefficient, R-squared and F statistic indicate the model does a fairly good job capturing the relationship between speed and stopping. Adding additional variables, however, could improve performance - as the R-squared is explaining approximately 65% of the variance. However, for a simple one variable linear model the model does well.