Computational Mathematics - Regression Analysis II

Euclides Rodriguez

2022-04-10

Introduction

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and create an analysis of the regression model (visualization, quality evaluation of the model, and residual analysis.)

Libraries

library(tidyverse)

Data

data(cars)
head(cars, 6)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
ggplot(data = cars, aes(x = speed, y = dist)) +
  geom_point(color = "steelblue")+
  theme_minimal()+
  labs(x = "Speed", y = "Stopping Distance")

Linear Regression Model - One-Factor

cars.lm <- lm(dist~speed, cars)
summary(cars.lm)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
cars.res <- resid(cars.lm)

##Residual Analysis Plot
ggplot(data=cars,aes(x=dist, y=cars.res))+
  geom_hline(yintercept = 0)+
  geom_point(color="steelblue")+
  theme_minimal()+
  labs(x = "Stopping Distance", y = "Residuals")

##QQ Plot
ggplot(cars,aes(sample=cars.res))+
  stat_qq(color="steelblue")+
  stat_qq_line()+
  theme_minimal()+
  labs(x = "Theoretical Quantiles", y = "Sample Quantiles")

Analysis

Summary Statistics: Based on the summary statistics of the linear model it can be seen that the residual mean is close to zero. In addition the the 1st and 3rd quantile have the same magnitude but the min and the max do not. This seems to implicate that the residuals are not normal for the tail end of the data.

The ratio of the intercept estimate to the standard error is less than five indicating this parameter may vary significantly. The ratio of the slope estimate to the standard error is approximately 10 indicating little variability for this parameter.

The significance value is approximately zero therefore indicating that speed is relevant in the model and statistically significant.

Lastly, we review the residual data visualizations. In our first Residual Analysis plot we see that we do not have constant variability. After 65 mph we see the residuals begin to trend above zero. In the second plot (Q-Q Plot) we see that the data is normal until Theoretical Quantile 1. After this Quantile the residuals are not normal and tail off from our QQ Line.

Conclusion

Based on the analysis above we can state that speed is not a sufficient predictor of stopping distance using a one-factor linear model.