No difference between an automatic and a manual transmission for MPG after covariate adjustment

author: angelayuan
date: Sunday, April 26, 2015

Synopsis

In this report I aim to investigate “Is an automatic or manual transmission better for MPG” and to quantify the MPG difference between automatic and manual transmissions. To this end, I use the mtcars data and estimate an expected 0.26 increase in miles/gallon for a manual transmission vs. an automatic transmission holding the remaining variables constant. The mpg difference between an automatic and a manual transmission is not significant, indicating that there is no difference between an automatic and a manual transmission for MPG after covariate adjustment.

Exploratory data analyses

First, I load the mtcars data in the R datasets package. Second, I plot a figure to explore relationships between all pairs of variables in the mtcars dataset (see Figure 1 in appendix). We can see that seven variabls (mpg, cyl, disp, hp, drat, wt, and gear) show clear differences between an automatic and a manual transmissoin.

library(datasets); data(mtcars)

Regression models

Given that we are interested in the coefficient of variable am, I use covariate adjustment and multiple models to probe that effect to evaluate it for robustness and to see what other covariates knock it out. In the first model, I only include the regressor am. And then, in each new model, I add in a new second regressor selected from cyl, disp, hp, drat, wt, and gear. You can see the coefficient of am in each model as follows.

fit1 <- lm(mpg ~ factor(am), data=mtcars); fit2 <- lm(mpg ~ factor(am)+factor(cyl), data=mtcars)
fit3 <- lm(mpg ~ factor(am)+disp, data=mtcars); fit4 <- lm(mpg ~ factor(am)+hp, data=mtcars)
fit5 <- lm(mpg ~ factor(am)+drat, data=mtcars); fit6 <- lm(mpg ~ factor(am)+wt, data=mtcars)
fit7 <- lm(mpg ~ factor(am)+gear, data=mtcars)
out <- c(summary(fit1)$coef[2,1],summary(fit2)$coef[2,1],summary(fit3)$coef[2,1],summary(fit4)$coef[2,1],summary(fit5)$coef[2,1],summary(fit6)$coef[2,1],summary(fit7)$coef[2,1])
names(out) <- c("model 1","model 2","model 3","model 4","model 5","model 6","model 7")
round(out,3)
## model 1 model 2 model 3 model 4 model 5 model 6 model 7 
##   7.245   2.560   1.833   5.277   2.807  -0.024   7.142

From above results, we can see that the coefficient of am is mainly affected by cyl, disp, drat, and wt. I then fit three nested models and use nested likelihood ratio tests to figure out which model should be selected.

fit8 <- lm(mpg ~ factor(am)+factor(cyl)+disp, data=mtcars);fit9 <- update(fit8, mpg ~ factor(am)+factor(cyl)+disp+drat)
fit10 <- update(fit9, mpg ~ factor(am)+factor(cyl)+disp+drat+wt)
anova(fit8,fit9,fit10)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am) + factor(cyl) + disp
## Model 2: mpg ~ factor(am) + factor(cyl) + disp + drat
## Model 3: mpg ~ factor(am) + factor(cyl) + disp + drat + wt
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     27 230.46                              
## 2     26 230.33  1     0.128 0.0175 0.89573  
## 3     25 182.68  1    47.657 6.5221 0.01713 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results show no difference between model 8 and model 9 but a significant difference between model 9 and model 10. Model 10 shoule be selected. Coefficients related to am are as folllows. The adjusted R-square is 0.7988.

summary(fit10)$coef[2,]
##   Estimate Std. Error    t value   Pr(>|t|) 
##  0.2613658  1.5396966  0.1697515  0.8665717

Finally, we examine model 10 by doing residual plots (see Figure 2) and some diagnostics (see following code). Results show that model 10 is a suitable model.

c(max(hatvalues(fit10))-min(hatvalues(fit10)),max(dfbetas(fit10))-min(dfbetas(fit10)))
## [1] 0.2567406 1.1302703

Conclusion

We can conclude that (1) there is an expected 0.26 increase in miles/gallon for a manual transmission compared with an automatic transmission, holding the remaining variables constant; (2) the mpg difference between an automatic and a manual transmission is not significant, indicating no difference between an automatic and a manual transmission for MPG after covariate adjustment; (3) 79.88% of total variation can be described by the model 10.

Appendix

Figure 1: Explore relations between any pair of variables in the mtcars dataset.

par(mar = c(0.5,0.5,0.5,0.5))
pairs(mtcars, panel=panel.smooth, main="mtcars data", col= 3+(mtcars$am==0))

Figure 2: Residual plots.

par(mfrow = c(2,2))
plot(fit10)