Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)
We may view the following information for the dataset Speed and Stopping Distances of Cars by looking at the help details for cars:
The information provides information on car speeds and stopping distances. Keep in mind that the data was collected in the 1920s.
Usage
autos Format
50 observations on two variables are contained in a data frame.
Speed in numeric form (mph)
Stopping distance in terms of distance (ft)
Bringing the dataset in
dataset_cars <- cars
summary(dataset_cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
library(ggplot2)
ggplot() +
geom_point(aes(x = dataset_cars$speed, y = dataset_cars$dist),
color = 'red') +
ggtitle('speed and stopping distance of cars') +
xlab('speed (mph)') +
ylab('stopping distance (ft)')
The speed of the car and the stopping distance appear to be positively correlated from the initial perspective. Let’s look at the relationship statistics and how closely these two are related in more detail. Using the lm function, a linear relationship between speed and stopping distance is defined. The model will then be assessed.
regressor <- lm(formula = dist ~ speed,
data = dataset_cars)
regressor
##
## Call:
## lm(formula = dist ~ speed, data = dataset_cars)
##
## Coefficients:
## (Intercept) speed
## -17.579 3.932
The two crucial values are evident from the information above: Distance needed to stop a car at zero speed, or intercept. Although it is not a realistic circumstance, it is utilized to establish the border. Slope is the change in the dependent variable for each unit change in the independent variable. Its value is -17.579. Its number is 3.932, which implies that for every mph change in speed, the associated stopping distance will change by 3.932 feet. Distance to stop: -17.579 + 3.932 X speed assessing the model
summary(regressor)
##
## Call:
## lm(formula = dist ~ speed, data = dataset_cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
As we can see above, the car’s speed accounts for 65.11 percent of the variation in stopping distance. Creating a residual plot
plot(fitted(regressor),resid(regressor))
qqnorm(resid(regressor))
qqline(resid(regressor))