At first glance, it looks like as if automatic transmission have much lower mpg as manual transmission
## factor(anName)automatic factor(anName)manual
## 17.14737 24.39231
However, there are other factors that correlate with mpg much higher than type of transmission (am):
## cyl disp hp drat
## cyl 1.0000000 0.9020329 0.8324475 -0.6999381
## disp 0.9020329 1.0000000 0.7909486 -0.7102139
## hp 0.8324475 0.7909486 1.0000000 -0.4487591
## drat -0.6999381 -0.7102139 -0.4487591 1.0000000
fit2 <- lm(mpg ~ factor(am) + wt + cyl, data = mtcars)
anova(fit1, fit2)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(anName) - 1
## Model 2: mpg ~ factor(am) + wt + cyl
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 191.05 2 529.85 38.828 8.428e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Pr(>F) is < 1e-05, therefore we reject the null hyptheses, that adding those factors did not decrease residuals. Our second model is much better at estimating mpg, the difference between automatic transmission and manual transmission however, almost vanished:
## automatic manual changeInInterceptForManual
## (Intercept) 39.41793 39.59443 0.1764932
Looking at the residuals, we see that the data isn’t perfectly normally distributed, but not as worse that we could reject the hypothesis of a normal distribution of the residuals:
##
## Shapiro-Wilk normality test
##
## data: fit2$residuals
## W = 0.93688, p-value = 0.06108
Also the variance inflation factors of our second model are limited:
## factor(am) wt cyl
## 1.924955 3.609011 2.584066
To test if we didn’t miss out important variables, we will fit another model, now including variables that haven’t been ruled our by our SME yet.
fit3 <- lm(formula = mpg ~ factor(am) + wt + cyl + drat + carb + gear + qsec, data = mtcars)
anova(fit2, fit3)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(am) + wt + cyl
## Model 2: mpg ~ factor(am) + wt + cyl + drat + carb + gear + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 191.05
## 2 24 154.92 4 36.123 1.399 0.2643
vif(fit3)
## factor(am) wt cyl drat carb gear
## 4.442749 5.423153 11.028771 3.325738 3.979143 5.165204
## qsec
## 5.410065
We see, that adding remaining variables is statistically not significant for a better result, on the other hand, the variation inflation factors increased quite a bit.
We therefore use the secondary model to conclude our analysis.
The 95% confidence interval for the change from Automatic to Manual transmission in our secondary model does include 0. Therefore we fail to reject the null hypothesis, that mpg is the same for automatic and manual transmission.
## 2.5 % 97.5 %
## factor(am)1 -2.495555 2.848541
If we divide the dataset into two by transmission type, we actually see, that we are comparing different types of cars. Cars with automatic transmission tend to have more weight aswell as a bigger engine (and therefore lower mpg). That are the factors that really effect mpg, not the type of transmission.
## mpg cyl disp hp drat wt qsec
## automatic 17.14737 6.947368 290.3789 160.2632 3.286316 3.768895 18.18316
## manual 24.39231 5.076923 143.5308 126.8462 4.050000 2.411000 17.36000
## vs am gear carb
## automatic 0.3684211 0 3.210526 2.736842
## manual 0.5384615 1 4.384615 2.923077
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] car_2.1-5 corrplot_0.77 ggplot2_2.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 compiler_3.4.0 nloptr_1.0.4
## [4] plyr_1.8.4 tools_3.4.0 digest_0.6.12
## [7] lme4_1.1-14 evaluate_0.10.1 tibble_1.3.3
## [10] gtable_0.2.0 nlme_3.1-131 lattice_0.20-35
## [13] mgcv_1.8-17 rlang_0.1.1 Matrix_1.2-9
## [16] yaml_2.1.14 parallel_3.4.0 SparseM_1.77
## [19] stringr_1.2.0 knitr_1.16 MatrixModels_0.4-1
## [22] rprojroot_1.2 grid_3.4.0 nnet_7.3-12
## [25] rmarkdown_1.6 minqa_1.2.4 magrittr_1.5
## [28] backports_1.1.0 scales_0.4.1 htmltools_0.3.6
## [31] MASS_7.3-47 splines_3.4.0 pbkrtest_0.4-7
## [34] colorspace_1.3-2 labeling_0.3 quantreg_5.33
## [37] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3