Data 607 - Assignment #11

MtCars Ananalysis

Using the cars dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis).

attach(cars)

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

plot(cars$speed, cars$dist, xlab='Speed (mph)', ylab='Stopping Distance (ft)', 
     main='Stopping Distance vs. Speed')

Linear Modeal

cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm

## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Coefficients:
## (Intercept)   cars$speed  
##     -17.579        3.932

plot(cars$speed, cars$dist, xlab='Speed (mph)', ylab='Stopping Distance (ft)', 
     main='Stopping Distance vs. Speed')
abline(cars_lm)

There appears to be some correlation between two variables, but let us evaluate the linear model we have.

summary(cars_lm)

## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

The median value of the residuals is pretty close to zero, quartiles, and min/max values are approximently the same magnitude.

The standard error of the \(speed\) ariable is roughly 9 times smaller than the corresponding coefficient.

Consequently, the difference between the intercept estimate and standard error is less significant, so there may be more variability.

The \(speed\) coefficient is highly significant.

The intercept coefficient is less significant.

Furthermore, \(R^2\) explains about \(65.11\)% of the data’s variation.

plot(cars_lm$fitted.values, cars_lm$residuals, xlab='Fitted Values', ylab='Residuals')
abline(0,0)

It is possible to say that the outlier values do not show the same variance of the residuals. However, it is not very clear.

qqnorm(cars_lm$residuals)
qqline(cars_lm$residuals)

Althought again there are some problems at the outlier levels, the normal Q-Q plot of the residuals appears to follow the theoretical line. Residuals are reasonably normally distributed.

Data 607 - Assignment #11

Joseph Simone

11/5/2019

MtCars Ananalysis

Linear Modeal