Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
glimpse(cars)
## Rows: 50
## Columns: 2
## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
## $ dist  <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…

Create scatterplot and review if relationship is approximately linear

plot(cars$speed, cars$dist, xlab="Speed", ylab="Stopping Distance", main="Stopping Distance vs Speed")

Relationship looks approximately linear, so we will proceed with fitting a linear regression model.

Fitting linear regression model

model <- lm(dist ~ speed, data=cars)
summary(model)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

The summary shows that both intercept and speed are significant at a 95% confidence level. The adjusted R-squared is 0.64 which is moderately high. The F-statistic indicates the model is significant.

plot(model)

Residual vs Fitted plot shows there is no serious heteroskedasticity and since there does not appear to be a trend we can likely conclude the residuals are independent. The Q-Q plot shows while there are potentially some influential outliers, the residuals are approximately normal.

Based on our analysis we can be confident the fitted model meets the assumptions of linear regression.