Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
Let’s load the cars dataset and take a look at the date.
data(cars)
head(cars, 6)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
length(cars$speed)
## [1] 50
There are 2 columns in our dataset and there are 50 rows of data.
Scatterplot can be used to display the relationship between these 2 variables - let’s also add a regression line.
plot(cars$speed ~ cars$dist, main = "Speed vs Distance", xlab = "Distance", ylab = "Speed")
abline(lm(cars$speed~cars$dist), col="red") # regression line (y~x)
m1 <- lm(speed ~ dist, data = cars)
summary(m1)
##
## Call:
## lm(formula = speed ~ dist, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Here is our regreession model:
dist = 0.166*speed + 8.284
For a good model, we typically would like to see a standard error that is at least five to ten times smaller than the corresponding coefficient. In our model the standard error is 9.5 times smaller - so there is not a lot of variability. The p-values are pretty much equal to 0 which means that both the slope and the intercept are significant. The reported R2 of 0.65 for this model means that the model explains 65 percent of the data’s variation.
Let’s take a look at the sum of squares.
suppressWarnings(suppressMessages(library(statsr)))
plot_ss(x = dist, speed, data=cars, showSquares = TRUE)
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 8.2839 0.1656
##
## Sum of Squares: 478.021
Let’s plot residuals and see what those look like.
plot(fitted(m1),resid(m1))