Question

Using the cars dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (Visualization, quality evaluation of the model and residual analysis.

# loading libraries
library(tidyr)
library(knitr)
library(kableExtra)


# Creating a copy of cars dataset
cars_ds <- cars
head(cars_ds) %>% kable() 
speed dist
4 2
4 10
7 4
7 22
8 16
9 10
# Plotting the data in scatterplot to see the relationship between speed and distance
library(ggplot2)
ggplot(cars_ds, aes(x=speed, y=dist))+geom_point()+geom_smooth()+theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

  labs(title="Relationship between distance and speed")
## $title
## [1] "Relationship between distance and speed"
## 
## attr(,"class")
## [1] "labels"

It seems like there is positive relationship between speed and distance as per the visuals. Now let’s double check it through using regression function in R.

# Creating a regression function
model <- lm(dist ~ speed, cars_ds)
summary(model)
## 
## Call:
## lm(formula = dist ~ speed, data = cars_ds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

According to the results, speed has positive relationship with the distance. Dist is explained 65.11 percent by speed. Also with 1 mph increase in speed, distance will be increased by 3.93 units or vice versa.

The function is:

Dist = -17.579 + 3.9324(Speed)

# Residual analysis

par(mfrow=c(2,2))
plot(model)

No pattern can be seen in the data. Data is random and heteroskidasticity is satisfied. Also, the data is normally distributed. So we can say that model’s quality is good and we can rely on the conclusion. The model is overall good and there is positive relationship between distance and speed.