Executive Summary: This Motor Trends article is interested in the relationship between transmission types and fuel economy. We analyzed the miles per gallon of Manual vs. Automatic transmissions. We found the answer is yes, but the reasoning may be more due to the other characteristics (lower weight, lower number of cylinders) than the transmission type itself.
Data Transformation:
Using the mtcars dataset we begin by transforming certain variables (cyl, vs, gear and carb) from numeric to factors. We'll also change the factor levels of the am variable from 1 and 0 to Automatic and Manual for easier interpretation.
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
Exploratory: A simple boxplot comparing the miles per gallon (mpg) by transmission type is in Figure 1. We can clearly see a difference between the two, also show by average mpg:
aggregate(mpg ~ am, data = mtcars, mean)
## am mpg
## 1 Automatic 17.15
## 2 Manual 24.39
At a glance we know that Manual transmissions seem to get better gas mileage, but can we quantify how differently and whether the difference is really transmission type or some other vehicle characterisic?
Quantifying Automatic vs. Manual:
Our model selection strategy is choosing one model based just on mpg and am and then a model taking into account all available variables. We begin with a simple linear model based on transmission type (am) and miles per gallon (mpg):
fit.1 <- lm(mpg ~ am, data = mtcars)
summary(fit.1)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.134e-15
## amManual 7.245 1.764 4.106 2.850e-04
Based on this simple model it appears amManual transmission types is significant with a p-value < .05, but the R2 implies the model only explains 36% of the variance. Can we do better?
Modeling based on all variables in the dataset using forward and backward stepwise model selections to determine best fit variables:
fit.2 <- step(lm(mpg ~ ., data = mtcars), direction = "both")
summary(fit.2)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.9404 7.733e-13
## cyl6 -3.03134 1.40728 -2.1540 4.068e-02
## cyl8 -2.16368 2.28425 -0.9472 3.523e-01
## hp -0.03211 0.01369 -2.3450 2.693e-02
## wt -2.49683 0.88559 -2.8194 9.081e-03
## amManual 1.80921 1.39630 1.2957 2.065e-01
Analyzing the coefficients shows us that the variables cyl, hp, wt and am to be the best predictors of mpg. A Manual transmission vehicle should expect to get 1.8092 better mpg. Each increase of 1000lb of weight (wt) would expect to decrease the mpg experienced by 2.4968 miles. Cars with higher horsepower (hp) and 6 and 8 cylinder vehicles (cyl6 and cyl8) should perform worse than cars with lower horsepower and 4 cylinder vehicles.
Comparing the second model (fit.2) to the first (fit.1) using an Analyis of Variance (ANOVA) shows our second model based on multi-variable regression is superior to the first model (p-value of 1.7e-08).
anova(fit.1, fit.2)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ cyl + hp + wt + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 26 151 4 570 24.5 1.7e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residuals and Diagnostics:
We then created some Residuals plots from our model (see Figure 2) to investigate anything that looks non-normal. The Normal Q-Q plot shows residual points located mostly near the line implying the residuals are normally distributed. The Residuals vs. Fitted plot show randomly scattered points above and below the 0 line.
Next we'll run some diagnostics to diagnose any influential and/or high leverage outlying points. First the influential points:
infl <- dfbetas(fit.2)
tail(sort(infl[, "amManual"]), 3)
## Chrysler Imperial Fiat 128 Toyota Corona
## 0.3507 0.4292 0.7305
levrg <- hatvalues(fit.2)
tail(sort(levrg), 3)
## Toyota Corona Lincoln Continental Maserati Bora
## 0.2778 0.2937 0.4714
The cars above are all represented in one of the residual plots (minus the Lincoln Continental), indicating our analysis is accurate.
Conclusion
Our analysis shows that overall cars with a Manual transmission type do tend to get better gas mileage on average. Our best model (fit.2) explained 84% of the variance. There is an amount of uncertainty in this conclusion given two findings: 1) the amManual variable in our model was not statistically significant, 2) the more important variable seems to be the weight of the car, and as you can see in Figure 3 it could just be that Automatic transmission vehicles tend to be heavier
Do Manual transmission cars get better gas mileage? The answer is undoubtedly yes, but it probably has more to do with the weight and number of cylinders generally seen in those vehicles, and not the transmission type itself.
Appendix
Figure 1
par(mfrow = c(1, 1))
boxplot(mpg ~ am, data = mtcars, main = "Figure 1: MPG for Automatic vs Manual Transmissions",
names = c("Automatic", "Manual"))
Figure 2: Residual Plots for fit.2
par(mfrow = c(2, 2))
plot(fit.2)
Figure 3
par(mfrow = c(1, 1))
boxplot(wt ~ am, data = mtcars, main = "Figure 3: Weight for Automatic vs Manual Transmissions",
names = c("Automatic", "Manual"))