Executive Summary

  1. The analysis shows, that there is no evidence, that the type of transmission has an impact on mpg.
  2. We see however, that bigger cars (based on higher weight and bigger engines) tend to shipped with an automatic transmission, while smaller cars tend to be shipped with manual transmission.

Remarks

Analysis - Fitting the first model

At first glance, it looks like as if automatic transmission have much lower mpg as manual transmission

## factor(anName)automatic    factor(anName)manual 
##                17.14737                24.39231

However, there are other factors that correlate with mpg much higher than type of transmission (am):

Choosing variables to include in our second model

  • It’s obvious, that higher weight would contribute to more power needed to move the vehicle and therefore reducing mpg
  • The specifics of the engine problaby also contribute a lot to mpg, however considering our subject matter expert (wikipedia), a few characteristics should be ruled out:
    • vs is mostly depending on available space and effort in manufacturing
    • cyl, disp, hp and drat each correlate highly with each other on their own:
    • to limit an increase of the variance influence factors, we limit our model to ‘cyl’ regarding engine
##             cyl       disp         hp       drat
## cyl   1.0000000  0.9020329  0.8324475 -0.6999381
## disp  0.9020329  1.0000000  0.7909486 -0.7102139
## hp    0.8324475  0.7909486  1.0000000 -0.4487591
## drat -0.6999381 -0.7102139 -0.4487591  1.0000000
fit2 <- lm(mpg ~ factor(am) + wt + cyl, data = mtcars)
anova(fit1, fit2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(anName) - 1
## Model 2: mpg ~ factor(am) + wt + cyl
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     28 191.05  2    529.85 38.828 8.428e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Pr(>F) is < 1e-05, therefore we reject the null hyptheses, that adding those factors did not decrease residuals. Our second model is much better at estimating mpg, the difference between automatic transmission and manual transmission however, almost vanished:

##             automatic   manual changeInInterceptForManual
## (Intercept)  39.41793 39.59443                  0.1764932

Looking at the residuals, we see that the data isn’t perfectly normally distributed, but not as worse that we could reject the hypothesis of a normal distribution of the residuals:

## 
##  Shapiro-Wilk normality test
## 
## data:  fit2$residuals
## W = 0.93688, p-value = 0.06108

Also the variance inflation factors of our second model are limited:

## factor(am)         wt        cyl 
##   1.924955   3.609011   2.584066

Fitting remaining variables - third model

To test if we didn’t miss out important variables, we will fit another model, now including variables that haven’t been ruled our by our SME yet.

fit3 <- lm(formula = mpg ~ factor(am) + wt + cyl + drat + carb + gear + qsec, data = mtcars)
anova(fit2, fit3)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am) + wt + cyl
## Model 2: mpg ~ factor(am) + wt + cyl + drat + carb + gear + qsec
##   Res.Df    RSS Df Sum of Sq     F Pr(>F)
## 1     28 191.05                          
## 2     24 154.92  4    36.123 1.399 0.2643
vif(fit3)
## factor(am)         wt        cyl       drat       carb       gear 
##   4.442749   5.423153  11.028771   3.325738   3.979143   5.165204 
##       qsec 
##   5.410065

We see, that adding remaining variables is statistically not significant for a better result, on the other hand, the variation inflation factors increased quite a bit.

We therefore use the secondary model to conclude our analysis.

Conclusion

The 95% confidence interval for the change from Automatic to Manual transmission in our secondary model does include 0. Therefore we fail to reject the null hypothesis, that mpg is the same for automatic and manual transmission.

##                 2.5 %   97.5 %
## factor(am)1 -2.495555 2.848541

If we divide the dataset into two by transmission type, we actually see, that we are comparing different types of cars. Cars with automatic transmission tend to have more weight aswell as a bigger engine (and therefore lower mpg). That are the factors that really effect mpg, not the type of transmission.

##                mpg      cyl     disp       hp     drat       wt     qsec
## automatic 17.14737 6.947368 290.3789 160.2632 3.286316 3.768895 18.18316
## manual    24.39231 5.076923 143.5308 126.8462 4.050000 2.411000 17.36000
##                  vs am     gear     carb
## automatic 0.3684211  0 3.210526 2.736842
## manual    0.5384615  1 4.384615 2.923077

sessionInfo() from R for reproducibility

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_2.1-5     corrplot_0.77 ggplot2_2.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11       compiler_3.4.0     nloptr_1.0.4      
##  [4] plyr_1.8.4         tools_3.4.0        digest_0.6.12     
##  [7] lme4_1.1-14        evaluate_0.10.1    tibble_1.3.3      
## [10] gtable_0.2.0       nlme_3.1-131       lattice_0.20-35   
## [13] mgcv_1.8-17        rlang_0.1.1        Matrix_1.2-9      
## [16] yaml_2.1.14        parallel_3.4.0     SparseM_1.77      
## [19] stringr_1.2.0      knitr_1.16         MatrixModels_0.4-1
## [22] rprojroot_1.2      grid_3.4.0         nnet_7.3-12       
## [25] rmarkdown_1.6      minqa_1.2.4        magrittr_1.5      
## [28] backports_1.1.0    scales_0.4.1       htmltools_0.3.6   
## [31] MASS_7.3-47        splines_3.4.0      pbkrtest_0.4-7    
## [34] colorspace_1.3-2   labeling_0.3       quantreg_5.33     
## [37] stringi_1.1.5      lazyeval_0.2.0     munsell_0.4.3