INSTRUCTIONS

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

LOAD PACKAGES

pkges <- c("ggplot2", "dplyr")

# Loop through the packages
for (p in pkges) {
  # Check if package is installed
  if (!requireNamespace(p, quietly = TRUE)) {
    install.packages(p) #If the package is not installed, install the package
    
    library(p, character.only = TRUE) #Load the package
  } else {
    library(p, character.only = TRUE) #If the package is already installed, load the package
  }
}

PREVIEW DATA

There is no need to load the dataset because the dataset is built in R.

There are 2 columns and 50 rows in the dataset. The explanatory variable that will be used in this model is ‘speed’ and the response variable will be ‘distance’.

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
# Rename the column names
Cars_DF <- cars %>% 
           rename("SPEED" = "speed", "DISTANCE" = "dist")
Cars_DF
##    SPEED DISTANCE
## 1      4        2
## 2      4       10
## 3      7        4
## 4      7       22
## 5      8       16
## 6      9       10
## 7     10       18
## 8     10       26
## 9     10       34
## 10    11       17
## 11    11       28
## 12    12       14
## 13    12       20
## 14    12       24
## 15    12       28
## 16    13       26
## 17    13       34
## 18    13       34
## 19    13       46
## 20    14       26
## 21    14       36
## 22    14       60
## 23    14       80
## 24    15       20
## 25    15       26
## 26    15       54
## 27    16       32
## 28    16       40
## 29    17       32
## 30    17       40
## 31    17       50
## 32    18       42
## 33    18       56
## 34    18       76
## 35    18       84
## 36    19       36
## 37    19       46
## 38    19       68
## 39    20       32
## 40    20       48
## 41    20       52
## 42    20       56
## 43    20       64
## 44    22       66
## 45    23       54
## 46    24       70
## 47    24       92
## 48    24       93
## 49    24      120
## 50    25       85

VISUALIZE THE DATA

Using a Plot to visualize the data to determine whether there exists a linear relationship between the predictor and the output value. As is indicated, there is a liner relationship between speed and distance. When speed is increased the stopping distance also increases.

par(bg="gray")
plot(Cars_DF, xlab = "SPEED (MPH)", ylab = "DISTANCE (FT)",
     col="purple", las = 1, main = "STOPPING DISTANCE vs SPEED")
grid()

BUILD THE LINEAR MODEL

Linear_model <- lm(DISTANCE ~ SPEED, data = Cars_DF)
Linear_model
## 
## Call:
## lm(formula = DISTANCE ~ SPEED, data = Cars_DF)
## 
## Coefficients:
## (Intercept)        SPEED  
##     -17.579        3.932

VISUALIZE THE DATA - LINEAR MODEL

par(bg="gray")
plot(Cars_DF, xlab = "SPEED (MPH)", ylab = "DISTANCE (FT)",
     col="purple", las = 1, main = "STOPPING DISTANCE vs SPEED")
abline(Linear_model, col="green")
grid()

EVALUATE THE LINEAR MODEL

Outlined below is a summary of the Linear Model.

summary(Linear_model)
## 
## Call:
## lm(formula = DISTANCE ~ SPEED, data = Cars_DF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## SPEED         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Plot Residuals

par(bg="gray")
plot(fitted(Linear_model), resid(Linear_model), xlab='FITTED VALUES', ylab='RESIDUALS', main = "LINEAR MODEL (RESIDUALS vs FITTED",col = "purple")
abline(0,0, col="green")

Plot Normal Q-Q

par(bg="gray")
qqnorm(resid(Linear_model), col = "purple")
qqline(resid(Linear_model), col = "green")

The four default diagnostic plots for the Linear Model developed using the cars data.

 par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
 plot(Linear_model, col = "purple")

CONCLUSIONS

My analysis of the Linear Model indicates that there is a positive correlation between the explanatory (speed) and response variable (stopping distance) and the relationship is linear. The residuals distribution also suggests that the distribution is normal.