Load Data

data("cars")
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Visualize the Data

This plot compares the values of stopping distance with speed, where distance is dependent on the speed of the car.

plot(x = cars$speed, y = cars$dist, main = "Speed vs Stopping Distance", xlab = "Speed", ylab="Distance")

Linear Regression Model

For the function \(y = a_0 + a_1x_1\) where \(y\) is the stopping distance and \(x\) is the speed, the linear model calculates that \(a_0 = -17.579\) and \(a_1 = 3.932\)

cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Coefficients:
## (Intercept)   cars$speed  
##     -17.579        3.932

Plotting the linear regression

The line shows us where the expected stopping distance would be if given a speed.

plot(cars$speed, cars$dist)
abline(cars_lm)

Summary of the linear regression model

The standard error for speed is 9.75 times smaller than the corresponding coefficent. This means there is relatively little variablility in the slope estimate, \(a_1\).

The standard error the intercept, \(a_1\) has a ratio of 2.6 to the correspoding coefficent which means it could vary a lot.

The max and min values of the residuals are close to the the same distance away from zero, though the max has a slightly greater distance. This could result is a slight skew.

The p-value for car’s speed is very small, meaning it plays a very important part in the model. The p-value for the intercept is .01, or about 1% chance that it is NOT relevant to the model.

The Multiple \(R^2\) is .65 which means it explains 65% of the data’s variation.

summary(cars_lm)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Residual Analysis

The residuals show the difference between the actual value and expected value on the regression line. The line is a good fit if the residual values are normally distributed around a mean of zero. We can visualize this with a plot.

The mean of the residuals is very close to zero and the data looks normally distributed though with maybe a slight skew to the right.

mean(cars_lm$residuals)
## [1] 8.65974e-17
hist(cars_lm$residuals, breaks = 35)

With this plot it is hard to tell if there is any skew. The residuals look evenly distributed around zero.

plot(fitted(cars_lm), resid(cars_lm))

Q-Q plot

This plot tells us how well the model fits the data. If the residuals are normally distributed then they would follow the straight line. Our data shows that is fits the model fairly well and that the car’s speed is a good indicator of the stopping distance.

qqnorm(resid(cars_lm))
qqline(resid(cars_lm))