This study is to determine the relationship between a cars speed vs its stopping distance. The speed variable is in mph and the dist variable is in feet. The purpose of this section is to build a model for stopping distance as a function of speed. The cars dataset consists of 50 observations of two variables - speed and dist.
library(tidyverse)
library(psych)
library(car)
library(olsrr)
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
desc(cars)
## [1] -2 -2 -5 -5 -7 -8 -9 -9 -9 -14 -14 -16 -16 -16
## [15] -16 -20 -20 -20 -20 -24 -24 -24 -24 -29 -29 -29 -32 -32
## [29] -35 -35 -35 -39 -39 -39 -39 -44 -44 -44 -47 -47 -47 -47
## [43] -47 -54 -56 -57 -57 -57 -57 -62 -1 -9 -2 -54 -32 -9
## [57] -39 -63 -72 -35 -67 -24 -47 -57 -67 -63 -72 -72 -80 -63
## [71] -75 -89 -95 -47 -63 -85 -69 -77 -69 -77 -83 -79 -87 -94
## [85] -96 -75 -80 -92 -69 -82 -84 -87 -90 -91 -85 -93 -98 -99
## [99] -100 -97
ggplot(cars, aes(speed, dist)) +
geom_point(aes(color=speed, alpha = speed)) +
xlab("Speed") +
ylab("Distance") +
ggtitle("Speed Vs. Stopping Distance") +
theme_light()
\[y = b_0 + b_1x_1\]
where \(x1\) is the input to the system, \(b_0\) is the y-intercept of the line, \(b_1\) is the slope, and \(y\) is the output the model predicts.
Using the \(lm\) function from R, we can create a linear model from our data. R will compute the values of \(b_0\) and \(b_1\) using the least squares method.
cars.lm <- lm(dist ~ speed, cars)
summary(cars.lm)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
In this model, the \(y - intercept\) is \(b_0= -17.5791\) and the slope is \(b_1 = 3.9324\)
intercept <- coef(cars.lm)[1]
slope <- coef(cars.lm)[2]
ggplot(cars.lm, aes(cars$speed, cars$dist)) +
geom_point() +
geom_abline(slope = slope, intercept = intercept, show.legend = TRUE)
The linear regression model is \(dist = -17.5791 + 3.9324*x\)
summary(cars.lm)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The model shows a 3.9324 increase in stopping distance per speed increase.
The least squares line accounts for \(R^2\) of 0.6438 of the data.
The \(p - values\) seem to indicate that the variables are significant
Looking at the residuals, we would expect that if our linear model is a good fit with the data, we would expect residuals that are normally distributed around a mean of zero. When looking at the output of summary(), we would epext residual values would tend to have a median value near zero.
ols_plot_resid_fit(cars.lm)
In the residual plot, we see that the residuals tend to decrease as we move right
crPlots(cars.lm)
The CR plot also shows that the residuals tend to deviate from the linear regression line.
ols_plot_resid_qq(cars.lm)
The Q-Q plot also shows that there are some outlier values.
ols_plot_resid_hist(cars.lm)