The purpose of this assignment is to identify which type of transmission (manual vs automatic) is better for fuel efficiency (mpg), and to quantify this difference, if any. We will be looking data from the mtcars dataset, extracted from the 1974 Motor Trend US magazine.
We find that there is a significant difference in fuel efficiency between transmission types, though this may be caused by the relationship between transmission type and weight.
The variable in the mtcars dataset that represents transmission is am. This numeric vector is 0 for automatic, and 1 for manual. We will create a corresponding factor transmission that simply applies English labels to the am vector and store it in the mtcars dataset. This will make it easier to understand plots and models.
transmission <- factor(mtcars$am, labels=c('automatic', 'manual'))
mtcars$transmission <- transmission
We can get a simple idea of the distribution of mpg by weight via a boxplot and a histogram of the data.
Cars with manual transmission tend to have higher mpg than those with automatic.
According to the data, cars with manual transmission tend to get higher mpg than those with automatic transmission, as they have a higher mean mpg.
ddply(mtcars, .(transmission), summarize, mean = mean(mpg))
## transmission mean
## 1 automatic 17.14737
## 2 manual 24.39231
We can use a t-test to determine whether or not the difference in mean mpg by tranmission is significant.
test <- with(mtcars, t.test(mpg[transmission == 'manual'], mpg[transmission == 'automatic']))
test
##
## Welch Two Sample t-test
##
## data: mpg[transmission == "manual"] and mpg[transmission == "automatic"]
## t = 3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.209684 11.280194
## sample estimates:
## mean of x mean of y
## 24.39231 17.14737
With a p-value of 0.0013736, we can conclude that the difference in averages of mpg between automatic and manual transmissions are, holding all else constant, significantly different.
We can create a model based solely on these two means.
fit <- lm(mpg ~ transmission - 1, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ transmission - 1, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## transmissionautomatic 17.147 1.125 15.25 1.13e-15 ***
## transmissionmanual 24.392 1.360 17.94 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9452
## F-statistic: 277.2 on 2 and 30 DF, p-value: < 2.2e-16
The coefficient transmissionautomatic represents the mean mpg for automatic transmission, and transmissionmanual represents the mean mpg for manual. This model simply maps each transmission to its corresponding average mpg. Given it has an \(R^2\) of 0.949, it actually explains a significant percentage of the variation in the data.
The residuals appear to be normally distributed as well, and sum very close to zero.
round(sum(resid(fit)), 5) == 0
## [1] TRUE
plot(resid(fit))
We can conclude that using just the mean mpg by transmission is a strong estimate of true mpg.
It is with investigating whether or not there exists a better multivariate model to predict mpg. To find the most accurate model, we will use R’s built-in step function.
stepfit = step(lm(mpg ~ . - 1, data = mtcars), trace = 0)
summary(stepfit)
##
## Call:
## lm(formula = mpg ~ wt + qsec + transmission - 1, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## transmissionautomatic 9.6178 6.9596 1.382 0.177915
## transmissionmanual 12.5536 6.0573 2.072 0.047543 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.9879, Adjusted R-squared: 0.9862
## F-statistic: 573.7 on 4 and 28 DF, p-value: < 2.2e-16
This model includes confounding on weight (wt) and 1/4 mile time (qsec). For both types of transmission, a unit increase in wt causes a decrease in fitted value by 3.9165, and a unit increase in qsec causes an increase in fitted value by 1.2259.
This model too has normally distributed residuals that sum to approximately zero.
round(sum(resid(stepfit)), 5) == 0
## [1] TRUE
plot(resid(stepfit))
Furthermore, since it has an \(R^2\) value of 0.988 and has less variance in the residuals (they now range from -4 to 4 as opposed to -10 to 10), it appears to be an improvement on our previous model. We can use anova to identify if the two models are significantly different.
anova(fit, stepfit)
## Analysis of Variance Table
##
## Model 1: mpg ~ transmission - 1
## Model 2: mpg ~ wt + qsec + transmission - 1
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a p-value of approximately zero, we can conclude that the confounding model is significantly stronger than the single variate model.
One downside of this model is that it includes 1/4 second mile time. Unlike weight and transmission, this feature is dependent on the construction on the car. A designer can choose from the start of a project what the desired weight and transmission type of the vehicle should be, but 1/4 second mile time will most likely be an outcome of this decision, and therefore would not be too useful in helping a car company identify how to build more fuel efficient cars.
Additionally, weight and transmission type are strongly related. As can be seen in the plot below, lighter cars tend to have manual transmission and heavier cars tend to have automatic transmission.
ggplot(data = mtcars, aes(x = wt, y = mpg, color = transmission)) +
geom_point() +
ggtitle('MPG by Weight and Transmission')
In fact, for region of the graph where there is an intersection between transmission types (approximately from 2.5-3.5 on the X-axis), manual cars have lower mpg than automatic, contradicting our previous assertion. It appears likely that our previous conclusion may have been influenced too strongly by the relationship between and transmission - that is, we identified automatic transmissions as being less fuel-efficient simply because automatic cars tend to be heavier. In order to truly identify which transmission type is more fuel efficient, we would have to randomize on weight.