Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

By looking at the help details for cars, we see the below details for the dataset

Speed and Stopping Distances of Cars

Description

The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s.

Usage

cars Format

A data frame with 50 observations on 2 variables.

[,1] speed numeric Speed (mph)

[,2] dist numeric Stopping distance (ft)

Importing the dataset

dataset_cars <- cars

summary(dataset_cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
library(ggplot2)
ggplot() +
  geom_point(aes(x = dataset_cars$speed, y = dataset_cars$dist),
             color = 'red') +
  ggtitle('speed and stopping distance of cars') +
  xlab('speed (mph)') +
  ylab('stopping distance (ft)')

From the initial view, there looks to be a positive linear relationship between the speed of the car and the stopping disctance. Let us check further how closely these 2 are related, and relationship statistics.

Defining a linear model between the speed and stopping distance using lm function. Then we will evaluate the model.

regressor <- lm(formula = dist ~ speed, 
                data = dataset_cars)

regressor
## 
## Call:
## lm(formula = dist ~ speed, data = dataset_cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932

From the above details, we see the 2 important values: Intercept - distance required to stop a vehicle when the speed is 0. This is not a valid scenario, however it is used to determine the line.Its value is -17.579 Slope - the change in the dependent variable for every unit change of the independent variable. Its value is 3.932, that means for every 1 mph increased or decrease in the speed, the corresponding stopping distance will be increased or decreased by 3.932 ft.

Stopping distance = -17.579 + 3.932 X speed

Evaluating the model

summary(regressor)
## 
## Call:
## lm(formula = dist ~ speed, data = dataset_cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

As we see above, 65.11% of the variation of the stopping distance is explained by the speed of the car.

Plotting the residuals:

plot(fitted(regressor),resid(regressor))

qqnorm(resid(regressor))
qqline(resid(regressor))