Executive Summary

This analysis reviews the mtcars data set of a collection of cars, in order to explore the relationship between transmission types and miles per gallon (MPG) (outcome). The goal is to address whether an automatic or manual transmission is better for MPG, and to quantify the MPG difference between automatic and manual transmissions.

Exploratory Data Analyses

Note: Transmission (0 = automatic, 1 = manual) A boxplot is used to explore the relationship between MPG and transmission types, Figure 1 results show that on average, cars with manual transmissions cover more MPG than those with automatic transmissions.

mtcars_data <- mtcars
with(mtcars_data, tapply(mpg, am, summary))
## $`0`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   14.95   17.30   17.15   19.20   24.40 
## 
## $`1`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.00   21.00   22.80   24.39   30.40   33.90

Regression Model Selection

The following is a linear model analysis to determine whether there is a significant relationship between MPG and transmission type.

fit0 <- lm(mpg ~ factor(am), data = mtcars)
summary(fit0)$coeff
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## factor(am)1  7.244939   1.764422  4.106127 2.850207e-04
summary(fit0)$r.squared
## [1] 0.3597989

The coefficient indicates the amount of increase in MPG per one standard deviation for manual transmissions. The low p-value indicates a significant relationship between transmission type and MPG The low r-squared value indicates there are confounders involved in the variability between MPG and transmission types.

Model Fitting

Five models were reviewed to identify the right fit.

fit0 <- lm(mpg ~ factor(am), data = mtcars)
fit1 <- lm(mpg ~ factor(am) + factor(cyl), data = mtcars)
fit2 <- lm(mpg ~ factor(am) + factor(cyl) + hp, data = mtcars)
fit3 <- lm(mpg ~ factor(am) + factor(cyl) + hp + wt, data = mtcars)
fit4 <- lm(mpg ~ factor(am) + factor(cyl) + hp + wt + qsec, data = mtcars)
anova(fit0, fit1, fit2, fit3, fit4)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + factor(cyl)
## Model 3: mpg ~ factor(am) + factor(cyl) + hp
## Model 4: mpg ~ factor(am) + factor(cyl) + hp + wt
## Model 5: mpg ~ factor(am) + factor(cyl) + hp + wt + qsec
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     28 264.50  2    456.40 39.6232 1.772e-08 ***
## 3     27 197.20  1     67.30 11.6849  0.002166 ** 
## 4     26 151.03  1     46.17  8.0172  0.009017 ** 
## 5     25 143.98  1      7.04  1.2230  0.279293    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rbind(summary(fit0)$coef[2,],
summary(fit1)$coef[2,],
summary(fit2)$coef[2,],
summary(fit3)$coef[2,],
summary(fit4)$coef[2,])
##      Estimate Std. Error  t value     Pr(>|t|)
## [1,] 7.244939   1.764422 4.106127 0.0002850207
## [2,] 2.559954   1.297579 1.972869 0.0584571679
## [3,] 4.157856   1.256550 3.308946 0.0026598064
## [4,] 1.809211   1.396305 1.295714 0.2064596738
## [5,] 2.832696   1.670199 1.696022 0.1023014560
rbind(summary(fit0)$adj.r,
      summary(fit1)$adj.r,
      summary(fit2)$adj.r,
      summary(fit3)$adj.r,
      summary(fit4)$adj.r)
##           [,1]
## [1,] 0.3384589
## [2,] 0.7399447
## [3,] 0.7989306
## [4,] 0.8400875
## [5,] 0.8414477

Conclusion

The 5 different models all show that the MPG is higher for manual transmissions. The ideal model fit is Model 3 because it has the lowest standard error and a p-value < 0.05. It also has a reasonably high adjusted r-squared value. The residual plots show that the model displays heteroskedasticity.

…………………………………………………………………………….

…………………………………………………………………………….

Appendix

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
mtcars_data <- mtcars
mtcars_data$TransmissionType <- factor(mtcars$am)

levels(mtcars_data$TransmissionType) <- c("Automatic","Manual")
g <- ggplot(data = mtcars_data, aes(y=mpg, x=TransmissionType, fill = TransmissionType)) + geom_boxplot()
g <- g + labs(title = "Figure 1: Automatic vs. Manual Transmission Type by MPG ")
g

Plot Residuals

par(mfrow = c(2, 2)) 
fit <- lm(mpg ~ . , data = mtcars); plot(fit)