Plotting the Data

Fit a Linear Model

I used the lm function to find the relationship between stopping distance and speed as a linear model.

model <- lm(dist ~ speed, data = cars)

Display Line of Best Fit

## integer(0)

Look at the Summary

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
## MODEL INFO:
## Observations: 50
## Dependent Variable: dist
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(1,48) = 89.57, p = 0.00
## R² = 0.65
## Adj. R² = 0.64 
## 
## Standard errors: OLS
## -------------------------------------------------
##                       Est.   S.E.   t val.      p
## ----------------- -------- ------ -------- ------
## (Intercept)         -17.58   6.76    -2.60   0.01
## speed                 3.93   0.42     9.46   0.00
## -------------------------------------------------

Residuals

The residuals tell us about the differences between observed values and values predicted by the model. The minimum, or largest underestimation by the model, is -29.069. The largest overestimation was 43.201. The median residual value is -2.272. The residual values tell us more when we can cmompare them with the model’s predictions:

We see that the values are fairly evenly distributed evenly around 0. Residuals do tend to increase as we move to the right, meaning the model will struggle with prediction at larger values.

We can use a Q-Q plot to visualize whether or not the residuals are normally distributed:

The residuals don’t deviate too much from the the line with a few notable aberrations.

Coefficients

We can now look at the coefficients. The intercept is -17.5719, meaning that the value of the y-axis (stopping distance) is -17.5719 when the value of the x-axis (speed) is 0. A negative stopping distance doesn’t really make sense- this is just extrapolation. But for the intercept the standard error is 6.76. The speed coefficient is 3.93 with a standard error of 0.42. What we care about is the standard error in comparison to the coefficient’s value, which is conveniently recorded in the t-value section. The SE for the intercept is -2.60 times smaller than the intercept, while the SE for speed is 9.46 times smaller than the intercept. The smaller the error in comparison to the coefficient, the better. In this case we have a good indication that speed is a good predictor.

The p-value relates the the t-statistic in that it is an indicator of how likely that ratio of error is to be observed. In our data the p-value is 0.0123 for the intercept and more importantly 1.49e-12 for the speed variable. This means that it is extremely unlikely to even observe that level or more of error:coefficient.

Residual Standard Error

This shows us the average amount that the responses will deviate from the regression line when compared to the residuas. For this model that is 15.38 on 48 degrees of freedom.

Multiple R-squared

This value shows us what percentage of the variability is explained by the predictor value speed. 0.6511 is our value, which means 65.11% of the variability in stopping distance is explained by the predictors,in this case just speed.

F-statistic

This stat shows the overall significance of the model- 89.57 in this case.