Lets first look at the structure of the data set
drive <- cars
glimpse(drive)
## Rows: 50
## Columns: 2
## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
## $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…
plot_ly(data = drive, x = ~speed, y = ~dist, type = "scatter", mode = "markers") %>%
layout(
title = "Speed vs. Stopping Distance",
xaxis = list(title = "Speed"),
yaxis = list(title = "Stopping Distance")
)
Lets run a linear regression on the dataset (omitted the intercept, because in the context of vehicles, there should be no stopping distance when the car is not moving, which was -17.5791)
model <- lm(dist ~ speed+0, data = drive)
summary(model)
##
## Call:
## lm(formula = dist ~ speed + 0, data = drive)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.183 -12.637 -5.455 4.590 50.181
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## speed 2.9091 0.1414 20.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.26 on 49 degrees of freedom
## Multiple R-squared: 0.8963, Adjusted R-squared: 0.8942
## F-statistic: 423.5 on 1 and 49 DF, p-value: < 2.2e-16
Note that we see that the residuals tend to increase as we move to the right. Additionally, the residuals are not uniformly scattered above and below zero.
plot(fitted(model), resid(model))
#### We see that the values seem to not follow a particular pattern, the
model seems accurate
qqnorm(resid(model))
qqline(resid(model))