Executive Summary

This paper investigates and quantifies the differences in miles per gallon (MPG) for automatic versus manual transmissions with the mtcars dataset. This data comes from the 1974 Motor Trend magazine and includes 10 car specifications for 32 vehicles. In this dataset, two pieces of data, weight and horsepower, combine to negate the impact of manual vs. automatic transmission. In other words, while manual transmission typically has better gas mileage, this can be explained away by the impact of horsepower and weight, suggesting that there isn’t a difference between the two.

Data Prep and Exploratory Analysis

To prepare the data for analysis, I first converted the categorical/discreet variables into factors. I also included the levels for Automatic and Manual transmission to prevent any confusion as to which is which.

mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$gear <- as.factor(mtcars$gear)
levels(mtcars$am) <- c("Automatic", "Manual")

To begin to analyze the data, I take the boxplots to quickly compare the means and spread of the data.

boxplot(mpg~am, data=mtcars, ylab="MPG")

While this seems to suggest that the regression would show the same impact, digging further into the data suggests that this may not be the full story. From the graphs in the appendix, it looks as though weight and horsepower may be causing the differences between automatic and manual transmissions (see the appendix for more information about why weight and horsepower were chosen).

The models

Here I will use two models (mpg~am+wt+hp and mpg~wt+hp) and contend that there is not a significant difference between the models.

Model 1: Transmission, Weight, and HP

fit<-lm(mpg~am+wt+hp, data=mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## amManual     2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

This model explains about 84% of the variance between the residuals and the model predictions, and suggests that cars with manual transmission get 2.08 more MPG than their automatic competitors.

Model 2: Only Weight and HP

fit2<-lm(mpg~wt+hp, data=mtcars)
summary(fit2)
## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

Interestingly, Model 2 explains 83% of the variance with fewer variables than the first model.

Before running with this model, we should check that the residuals are normal.

par(mfrow=c(2,2))
plot(fit2)

Based on these graphs (the top right in particular), the residuals appear normal.

Model comparison

anova(fit,fit2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am + wt + hp
## Model 2: mpg ~ wt + hp
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 180.29                           
## 2     29 195.05 -1   -14.757 2.2918 0.1413

This ANOVA shows that there is not a significant difference between the models (we accept the null hypothesis that there is no difference between the two models), thus suggesting that the impact of weight and horsepower might explain away the difference between manual and automatic transmissions. There is also a possibility that the opposite is true (i.e. transmission type determines the weight and horsepower of the vehicle). Intuitively, this seems less plausible, but this analysis does not rule out such a possibility.

Conclusion

While there are other potential explanations, I believe that this analysis shows that horsepower and weight are better predictors of MPG than transmission type. With a more robust dataset, it would be possible to better tease apart these differences.

Appendix

Weight and Horsepower graphs explaining MPG

plot(mpg~wt,col=c("red","blue")[am], pch=19, data=mtcars)
legend(x=4,y=30, c("automatic","manual"),cex=.8, col=c("red","blue"),pch=19)

plot(mpg~hp,col=c("red","blue")[am], pch=19, data=mtcars)
legend(x=250,y=30, c("automatic","manual"),cex=.8, col=c("red","blue"),pch=19)

How weight and horsepower were selected

I determined that weight and horsepower were the best explanatory variables since in a “kitchen sink” model that included all of the variables, weight and horsepower had the best fits.

kitchensinkfit<-lm(mpg~., data=mtcars)
summary(kitchensinkfit)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2015 -1.2319  0.1033  1.1953  4.3085 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 15.09262   17.13627   0.881   0.3895  
## cyl6        -1.19940    2.38736  -0.502   0.6212  
## cyl8         3.05492    4.82987   0.633   0.5346  
## disp         0.01257    0.01774   0.708   0.4873  
## hp          -0.05712    0.03175  -1.799   0.0879 .
## drat         0.73577    1.98461   0.371   0.7149  
## wt          -3.54512    1.90895  -1.857   0.0789 .
## qsec         0.76801    0.75222   1.021   0.3201  
## vs1          2.48849    2.54015   0.980   0.3396  
## amManual     3.34736    2.28948   1.462   0.1601  
## gear4       -0.99922    2.94658  -0.339   0.7382  
## gear5        1.06455    3.02730   0.352   0.7290  
## carb         0.78703    1.03599   0.760   0.4568  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.616 on 19 degrees of freedom
## Multiple R-squared:  0.8845, Adjusted R-squared:  0.8116 
## F-statistic: 12.13 on 12 and 19 DF,  p-value: 1.764e-06