Executive Summary

We want to use the R data set named “mtcars” to investigate if there a difference in miles per gallon (mpg) fuel efficiency between automatic and manual transmissions. Details about the data set can be found in the appendix.

In the analysis below we determine that there is a clear effect on MPG between automatic and manual transmissions. We need to prove it statistically and quantify exactly how much.

Exploratory Data Analysis

First we load the data set and then investigate the “am” variable. We turn that variable into a factor variable where 0 is an automatic transmission and 1 is a manual.

Let’s use ggplot2 to visualize any difference between automatic and manual. The plot is in the Appendix.

# Load ggplot2 for first plot, and load dataset.
data(mtcars)

# Change the automatic manual variable to be a factor
mtcars$am <- factor(mtcars$am, levels=c(0,1), labels=c("Automatic", "Manual"))

It looks like manual transmissions generally have higher MPG. Let’s do statistical tests to see if the difference is significant.

Is an automatic or manual transmission better for MPG?

Let’s run a t-test to see if there is a difference between mpg for the automatic and manual. The null hypothesis is that there is no difference in the means between the two groups.

## [1]  3.209684 11.280194
## attr(,"conf.level")
## [1] 0.95

It appears that the 95% confidence interval for the difference is between 3.2 and 11.2 and does not contain zero. Therefore we reject the null hypothesis. We conclude that manual transmission is better for MPG

Quantify the MPG difference between automatic and manual transmissions

First we create a linear model with mpg being determined by only the type of transmission.

# Create a model based only "am"
fit1 <- lm(formula = mpg ~ am, data = mtcars)
fit1
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Coefficients:
## (Intercept)     amManual  
##      17.147        7.245

This is our first fitted model, and it shows that generally manual transmissions get 7.245 better MPG. It doesn’t account for other variables that may be relevant. It is likely that there’s a difference in weight between manual and automatic transmissions that is the actual reason for better MPG.

Our second model will be mpg as a function of all remaining variables:

# Create a model based on all the variables in the dataset
fit2 <- lm(formula = mpg ~ ., data = mtcars)
fit2
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Coefficients:
## (Intercept)          cyl         disp           hp         drat  
##    12.30337     -0.11144      0.01334     -0.02148      0.78711  
##          wt         qsec           vs     amManual         gear  
##    -3.71530      0.82104      0.31776      2.52023      0.65541  
##        carb  
##    -0.19942

This model is guilty of overfitting too many variables. Let’s use stepwise model selection feature of R (step()). See the appendix for the code of how the model below is chosen.

# Use the stepwise function to discard variables that are overfitted.
fit3 <- step(fit2, trace=0, steps=10000)
summary(fit3)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amManual      2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

This shows that when you fix weight (wt) and quarter mile time (qsec) the difference between automatic and manual is reduced. This model accounts for 83% of the variation in MPG. It all factors are significant at the 95% level.

Let’s plot the residuals and see if this model has any problems. See the appendix for the plot.

Examining the residuals plot it appears that the model is fairly normal and that the residuals are roughly distributed around zero. Therefore we can make conclusions based off our final model.

Manual transmissions get 2.94 miles per gallon better fuel efficiency than automatic transmissions when weight and performance are kept constant.

Conclusions

We conclude that manual transmissions have better fuel efficiency than automatic transmissions. When considered alone, it would appear that cars in this data set that are manual get 7.25 higher MPG than automatic. When you consider the effect that other variables have on MPG, this effect is reduced. When you account for weight and performance (wt and qsec) we see that manual transmissions only get 2.94 higher MPG.

Appendix

From the R Documentation on the “mtcars” data set:

“The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).”

The mtcars data set contains 32 observations on 11 variables. The description and (variable name) are below.

Appendix Tables

# Plot mpg as a function of being automatic or manual
library(ggplot2)
g <- ggplot(data=mtcars, aes(x=am, y=mpg))
g <- g + geom_point()
print(g)

# Create a 2 by 2 area and plot the residuals of the best model
par(mfrow=c(2,2))
plot(fit3)