There are always same questions we are being asked, “Is an automatic or manual transmission better for MPG (miles per gallon)”? “Can you show me the quantitative MPG difference between automatic and manual transmissions?” such kinds of question which are related to choosing a car and saving money on gasoline. In this document we will give our answer to these questions based on our data.
This supplement published with our monthly magazione Motor Trend, you could also find the online version on RPubs here
Firstly we setup the relationship between transmission and MPG via statistical regression analysis technology and find the result that manual transmission is better for MPG. Secondly we go deeperly with data to show the detailed quantitative information on MPG between the two main transmissions. After analysing the single variable transmission, we create new models with new variables to further our finding about which variables help increase MPG.
In this part, we setup a regression model between transmissions and MPG. And below are the first 6 records of data.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
here, the most left column shows cars’ model and, other colums are properties of that model. while am variable is for Transmission (0 = automatic, 1 = manual) and as its names suggests mpg column is for MPG.
Now, let’s draw a basic bar plot to show the general distribution of MPG(mpg) with Transmission(am) and a regression line to show the general relationship between MPG(mpg) and Transmission(am).
plot(factor(mtcars$am),mtcars$mpg)
abline(lm(mpg~am,data=mtcars),col="red",lwd=3)
There’s obvious difference between these 2 variables compared their highest, mean and lowest value pairs. On any level, the manual transmission cars has a bigger MPG value.(0 for automatic and 1 for manual).
And We could also find the trend has a positive slope that means when transmission increases one unit(from 0 to 1), or to say from automatic to manual, the MPG value will increase.
fit=lm(mtcars$mpg~factor(mtcars$am))
fit
##
## Call:
## lm(formula = mtcars$mpg ~ factor(mtcars$am))
##
## Coefficients:
## (Intercept) factor(mtcars$am)1
## 17.147 7.245
Here, the intercept 17.147 is a virtual value when the regression model created, which can be regarded as a meaningless value used only for model creation(transmission equals negative value), and the slope 7.245 means every one unit increase of transmission will beget 7.245 units increase of MPG, or to say manual transmision cars has a higher MPG 7.245 than the automatic cars in general.
sumCoef <- summary(fit)$coefficients
sumCoef[2,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[2, 2]
## [1] 3.64151 10.84837
It shows the 95% confidence is 3.64151~10.84837, that make us confident for the conclusion that manual transmission have a higher MPG than automatic ones.
Now, draw a residual point plot.
plot(mtcars$am, resid(lm(mtcars$mpg ~ factor(mtcars$am))))
As the plot shows both transmissions have a very scattered (-10,10 for manual ) or (-7.5,7.5 automatic) residual, which means our model may be influenced by other variables and let’s do more research.
Now, we try to introduce other variables along with transmission. Since if the number of variables is greater than 2 will confuse customers rather than help them, so our purpose is finding one of the most useful variable along with transmission. #### Variables choosen Here are all variables that could influence MPG.
wt - Car Weight (lb/1000)
gear - Number of forward gears
carb - Number of carburetors
hp - Gross horsepower
cyl - Number of cylinders
fit0<-lm(mpg ~ factor(am) , data = mtcars)
fit1<-lm(mpg ~ factor(am)+wt , data = mtcars)
fit2<-lm(mpg ~ factor(am)+gear , data = mtcars)
fit3<-lm(mpg ~ factor(am)+carb , data = mtcars)
fit4<-lm(mpg ~ factor(am)+hp , data = mtcars)
fit5<-lm(mpg ~ factor(am)+factor(cyl) , data = mtcars)
at1<-anova(fit1);at2<-anova(fit2);at3<-anova(fit3);at4<-anova(fit4);at5<-anova(fit5)
For those variable with P-value >5%, that means it’s not significant to be introduced with the better fitted model.
at1$Pr[2];at2$Pr[2];at3$Pr[2];at4$Pr[2];at5$Pr[2]
## [1] 1.867415e-07
## [1] 0.9651278
## [1] 2.752235e-06
## [1] 2.920375e-08
## [1] 8.010109e-07
From the result, we know all 4 variables may influence MPG except the second one gear.
summary(fit1)$coef;summary(fit3)$coef;summary(fit4)$coef;summary(fit5)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
## factor(am)1 -0.02361522 1.5456453 -0.01527855 9.879146e-01
## wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.145836 1.294133 17.885213 3.315382e-17
## factor(am)1 7.653119 1.222958 6.257873 7.870255e-07
## carb -2.191748 0.377814 -5.801129 2.752235e-06
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.5849137 1.425094292 18.654845 1.073954e-17
## factor(am)1 5.2770853 1.079540576 4.888270 3.460318e-05
## hp -0.0588878 0.007856745 -7.495191 2.920375e-08
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.801852 1.322615 18.752135 2.182425e-17
## factor(am)1 2.559954 1.297579 1.972869 5.845717e-02
## factor(cyl)6 -6.156118 1.535723 -4.008612 4.106131e-04
## factor(cyl)8 -10.067560 1.452082 -6.933187 1.546574e-07
Based on the results from all 4 variables, the weight variables even reverse the effect of transmission so it could be removed from our model. The 3rd one carb doesn’t impact the transmission so obviously (in this case, the transmission slope is about 7, same as the original model contains only one variable transmission), the 4th hp have a very small influence with very small slope, while the last variable cyl does impact much, its slope is smaller than -6 (for different number of cylinders, their slopes are -6.16 and -10.07) . That means the number of cylinders are significant for MPG and the more number of cylinders, the lower MPG it gets.
Now we could answer the most asked question confidently, the manual transmission will really get a higher MPG, besides this concern, cars with smaller Number of cylinders will get higher MPG as well. Hope it helps when you are choosing your car.