We want to use the R data set named “mtcars” to investigate if there a difference in miles per gallon (mpg) fuel efficiency between automatic and manual transmissions. Details about the data set can be found in the appendix.
In the analysis below we determine that there is a clear effect on MPG between automatic and manual transmissions. We need to prove it statistically and quantify exactly how much.
First we load the data set and then investigate the “am” variable. We turn that variable into a factor variable where 0 is an automatic transmission and 1 is a manual.
Let’s use ggplot2 to visualize any difference between automatic and manual. The plot is in the Appendix.
# Load ggplot2 for first plot, and load dataset.
data(mtcars)
# Change the automatic manual variable to be a factor
mtcars$am <- factor(mtcars$am, levels=c(0,1), labels=c("Automatic", "Manual"))
It looks like manual transmissions generally have higher MPG. Let’s do statistical tests to see if the difference is significant.
Let’s run a t-test to see if there is a difference between mpg for the automatic and manual. The null hypothesis is that there is no difference in the means between the two groups.
## [1] 3.209684 11.280194
## attr(,"conf.level")
## [1] 0.95
It appears that the 95% confidence interval for the difference is between 3.2 and 11.2 and does not contain zero. Therefore we reject the null hypothesis. We conclude that manual transmission is better for MPG
First we create a linear model with mpg being determined by only the type of transmission.
# Create a model based only "am"
fit1 <- lm(formula = mpg ~ am, data = mtcars)
fit1
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Coefficients:
## (Intercept) amManual
## 17.147 7.245
This is our first fitted model, and it shows that generally manual transmissions get 7.245 better MPG. It doesn’t account for other variables that may be relevant. It is likely that there’s a difference in weight between manual and automatic transmissions that is the actual reason for better MPG.
Our second model will be mpg as a function of all remaining variables:
# Create a model based on all the variables in the dataset
fit2 <- lm(formula = mpg ~ ., data = mtcars)
fit2
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Coefficients:
## (Intercept) cyl disp hp drat
## 12.30337 -0.11144 0.01334 -0.02148 0.78711
## wt qsec vs amManual gear
## -3.71530 0.82104 0.31776 2.52023 0.65541
## carb
## -0.19942
This model is guilty of overfitting too many variables. Let’s use stepwise model selection feature of R (step()). See the appendix for the code of how the model below is chosen.
# Use the stepwise function to discard variables that are overfitted.
fit3 <- step(fit2, trace=0, steps=10000)
summary(fit3)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amManual 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
This shows that when you fix weight (wt) and quarter mile time (qsec) the difference between automatic and manual is reduced. This model accounts for 83% of the variation in MPG. It all factors are significant at the 95% level.
Let’s plot the residuals and see if this model has any problems. See the appendix for the plot.
Examining the residuals plot it appears that the model is fairly normal and that the residuals are roughly distributed around zero. Therefore we can make conclusions based off our final model.
Manual transmissions get 2.94 miles per gallon better fuel efficiency than automatic transmissions when weight and performance are kept constant.
We conclude that manual transmissions have better fuel efficiency than automatic transmissions. When considered alone, it would appear that cars in this data set that are manual get 7.25 higher MPG than automatic. When you consider the effect that other variables have on MPG, this effect is reduced. When you account for weight and performance (wt and qsec) we see that manual transmissions only get 2.94 higher MPG.
From the R Documentation on the “mtcars” data set:
“The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).”
The mtcars data set contains 32 observations on 11 variables. The description and (variable name) are below.
# Plot mpg as a function of being automatic or manual
library(ggplot2)
g <- ggplot(data=mtcars, aes(x=am, y=mpg))
g <- g + geom_point()
print(g)
# Create a 2 by 2 area and plot the residuals of the best model
par(mfrow=c(2,2))
plot(fit3)