Load Data
data("cars")
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
Visualize the Data
This plot compares the values of stopping distance with speed, where distance is dependent on the speed of the car.
plot(x = cars$speed, y = cars$dist, main = "Speed vs Stopping Distance", xlab = "Speed", ylab="Distance")
Linear Regression Model
For the function \(y = a_0 + a_1x_1\) where \(y\) is the stopping distance and \(x\) is the speed, the linear model calculates that \(a_0 = -17.579\) and \(a_1 = 3.932\)
cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm
##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Coefficients:
## (Intercept) cars$speed
## -17.579 3.932
Plotting the linear regression
The line shows us where the expected stopping distance would be if given a speed.
plot(cars$speed, cars$dist)
abline(cars_lm)
Summary of the linear regression model
The standard error for speed is 9.75 times smaller than the corresponding coefficent. This means there is relatively little variablility in the slope estimate, \(a_1\).
The standard error the intercept, \(a_1\) has a ratio of 2.6 to the correspoding coefficent which means it could vary a lot.
The max and min values of the residuals are close to the the same distance away from zero, though the max has a slightly greater distance. This could result is a slight skew.
The p-value for car’s speed is very small, meaning it plays a very important part in the model. The p-value for the intercept is .01, or about 1% chance that it is NOT relevant to the model.
The Multiple \(R^2\) is .65 which means it explains 65% of the data’s variation.
summary(cars_lm)
##
## Call:
## lm(formula = cars$dist ~ cars$speed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Residual Analysis
The residuals show the difference between the actual value and expected value on the regression line. The line is a good fit if the residual values are normally distributed around a mean of zero. We can visualize this with a plot.
The mean of the residuals is very close to zero and the data looks normally distributed though with maybe a slight skew to the right.
mean(cars_lm$residuals)
## [1] 8.65974e-17
hist(cars_lm$residuals, breaks = 35)
With this plot it is hard to tell if there is any skew. The residuals look evenly distributed around zero.
plot(fitted(cars_lm), resid(cars_lm))
Q-Q plot
This plot tells us how well the model fits the data. If the residuals are normally distributed then they would follow the straight line. Our data shows that is fits the model fairly well and that the car’s speed is a good indicator of the stopping distance.
qqnorm(resid(cars_lm))
qqline(resid(cars_lm))