1 Regression Problem

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

1.1 Import Data

There are 50 observations of speed of cars and the distances taken to stop from the dataset cars without any NA values.

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
## [1] 50  2

1.4 Evaluate the Model

Quality evaluation of the model:

Residuals: the median is close to zero and the 1Q vs 3Q and min vs max are roughly the same magnitude. This shows that the model is a fairly good model.

P-value: The p-value of speed is very close to zero, which shows a strong significance at 99% confidence, while the intercept is significant at 95% confidence. This shows that the model is predicting the two values quite well.

Multiple R-squared: The \(R^{2}\) value represents how well the model describes the measured data. It explains 65.11% of the data’s variation, which is fairly good.

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

1.5 Residual Analysis

The residuals vary around and are not uniformly scattered above and below zero. They do not show any special patterns.

The QQ plot shows some outliers above the line at the upper end and the lower end. The residuals might not be very normal.

To check the normality, we can also look at the histogram or a normal probability plot of the residuals.

The histogram is bimodal and slightly right-skewed. Therefore, the residuals are barely normal.