There is no need to load the cars
data. This is part of R
. Below if a preview of the cars
data.
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
In this linear model, speed
is the independent variable and stopping distance
is the dependent variable.
The plot shows that stopping distance increases as speed increases.
plot(cars, xlab = "Speed", ylab = "Stopping distance")
This linear model is based on a single factor regression. speed
is the independent variable (input) and stopping distance
is the dependent variable (output).
The intercept
is -17.5791. The slope
is 3.9324.
The one factor linear model is:
stopping distance
= -17.5791 + 3.9324 * speed
cars.lm <- lm(dist ~ speed, data = cars)
summary(cars.lm)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
The residuals distribution suggests that the distribution is normal.
The standard error for the speed
coefficient is ~ 9.4 (3.93/.42) times the coefficient value, which is good. From the book it says “For a good model, we typically would like to see a standard error that is at least five to ten times smaller than the corresponding coefficient”.
The probability that the speed
coefficient is not relevant in the model is 1.49e-12
(p-value), which means that speed
is very relevant in modeling stopping distiance
.
The p-value of the intercept is 0.0123
, which means the intercept is pretty relevant in the model.
The multiple R-squared is 0.6511
, which means that this model explains 65.11% of the data’s variation.
plot(cars, xlab = "Speed", ylab = "Stopping distance")
abline(cars.lm)
From the book:
“A model that fits the data well would tend to over-predict as often as it under-predicts. Thus, if we plot the residual values, we would expect to see them distributed uniformly around zero for a well-fitted model.”
The plot below shows that the residuals look uniformly distributed around zero. The residuals appear to be uniformly scattered above and below zero.
plot(fitted(cars.lm), resid(cars.lm))
The plot below suggests that there’s some skew to the right.
qqnorm(resid(cars.lm))
qqline(resid(cars.lm))