1 Introduction

This study is to determine the relationship between a cars speed vs its stopping distance. The speed variable is in mph and the dist variable is in feet. The purpose of this section is to build a model for stopping distance as a function of speed. The cars dataset consists of 50 observations of two variables - speed and dist.

2 Dataset - Cars Dataset

library(tidyverse)
library(psych)
library(car)
library(olsrr)
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
desc(cars)
##   [1]   -2   -2   -5   -5   -7   -8   -9   -9   -9  -14  -14  -16  -16  -16
##  [15]  -16  -20  -20  -20  -20  -24  -24  -24  -24  -29  -29  -29  -32  -32
##  [29]  -35  -35  -35  -39  -39  -39  -39  -44  -44  -44  -47  -47  -47  -47
##  [43]  -47  -54  -56  -57  -57  -57  -57  -62   -1   -9   -2  -54  -32   -9
##  [57]  -39  -63  -72  -35  -67  -24  -47  -57  -67  -63  -72  -72  -80  -63
##  [71]  -75  -89  -95  -47  -63  -85  -69  -77  -69  -77  -83  -79  -87  -94
##  [85]  -96  -75  -80  -92  -69  -82  -84  -87  -90  -91  -85  -93  -98  -99
##  [99] -100  -97

3 Data Visualization

ggplot(cars, aes(speed, dist)) + 
  geom_point(aes(color=speed, alpha = speed)) +
  xlab("Speed") +
  ylab("Distance") +
  ggtitle("Speed Vs. Stopping Distance") +
  theme_light()

4 Linear Model

\[y = b_0 + b_1x_1\]

where \(x1\) is the input to the system, \(b_0\) is the y-intercept of the line, \(b_1\) is the slope, and \(y\) is the output the model predicts.

Using the \(lm\) function from R, we can create a linear model from our data. R will compute the values of \(b_0\) and \(b_1\) using the least squares method.

cars.lm <- lm(dist ~ speed, cars)
summary(cars.lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

In this model, the \(y - intercept\) is \(b_0= -17.5791\) and the slope is \(b_1 = 3.9324\)

intercept <- coef(cars.lm)[1]
slope <- coef(cars.lm)[2]
ggplot(cars.lm, aes(cars$speed, cars$dist)) +
  geom_point() +
  geom_abline(slope = slope, intercept = intercept, show.legend = TRUE)

The linear regression model is \(dist = -17.5791 + 3.9324*x\)

5 Model Evaulation

summary(cars.lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

The model shows a 3.9324 increase in stopping distance per speed increase.

The least squares line accounts for \(R^2\) of 0.6438 of the data.

The \(p - values\) seem to indicate that the variables are significant

6 Residuals

Looking at the residuals, we would expect that if our linear model is a good fit with the data, we would expect residuals that are normally distributed around a mean of zero. When looking at the output of summary(), we would epext residual values would tend to have a median value near zero.

ols_plot_resid_fit(cars.lm)

In the residual plot, we see that the residuals tend to decrease as we move right

crPlots(cars.lm)

The CR plot also shows that the residuals tend to deviate from the linear regression line.

ols_plot_resid_qq(cars.lm)

The Q-Q plot also shows that there are some outlier values.

ols_plot_resid_hist(cars.lm)