SAlam_Assign11

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.)

library(tidyverse)
head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Visualization

In the scatter plot below, we see a moderate positive linear trend between speed and distance from the cars dataset.

cars %>% 
  ggplot(aes(speed, dist)) +
  geom_point() +
  geom_smooth(method = lm, se = F) +
  labs(title = "Cars",
       x = "Speed", y = "Distance") +
  theme_minimal()

cars_lm <- lm(speed ~ dist, data = cars)
cars_lm

## 
## Call:
## lm(formula = speed ~ dist, data = cars)
## 
## Coefficients:
## (Intercept)         dist  
##      8.2839       0.1656

Quality Evaluation of the Model

In our linear regression model below, we see the min-max and 1Q-3Q has roughly similar magnitudes and the median is close to zero. This means this model is good but lets do some more evaluation. The standard error is 49 times smaller than the corresponding coefficient. The p-value below shows that the probability of this variables to be irrelevant is very low. Lastly, R-squared is 0.65, which means this model explains 65% of the data’s variation. Overall, I would say this is a good model.

summary(cars_lm)

## 
## Call:
## lm(formula = speed ~ dist, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5293 -2.1550  0.3615  2.4377  6.4179 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.28391    0.87438   9.474 1.44e-12 ***
## dist         0.16557    0.01749   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Residual Analysis

In the residual plot below, we see that the variance of residuals are not uniform which indicates our explanatory variable is probably does not fully explain the data. But if we look at the quartile-quartile plot, we that the residuals are normally distributed. Therefore, I would say overall this is a good model.

cars_lm %>% 
  ggplot(aes(fitted(cars_lm), resid(cars_lm))) +
  geom_point() +
  geom_smooth(method = lm, se =F) +
  labs(title = "Residual Analysis",
       x = "Fitted Line", y = "Residuals") +
  theme_minimal()

cars_lm %>% 
  ggplot(aes(sample = resid(cars_lm))) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot") +
  theme_minimal()

SAlam_Assign11

Saayed Alam

April 14, 2019

Visualization

Quality Evaluation of the Model

Residual Analysis