There is a common belief that cars with a manual transmission are more fuel-efficient than automatics. Using 1974 data from Motor Trends we can analyze this claim using linear regression analysis. Based on the analysis presented below, manual transmission cars get on average 7.24 more miles per gallon than automatic cars. Although tempered, this main effect remains significant at the 5% confidence level, even when controlling for weight and acceleration of the vehicle.
We start our analysis by examining the claim that manual transmission cars have better gas mileage than automatic cars. We can load the data and do some basic analysis in R on this dataset.
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
library(ggplot2)
Specifically, our variables of interest are mpg and am, which is coded to 0 for automatic transmissions, and 1 for manual transmission. We can regress
\[\tiny (1) \normalsize \hspace{1cm} Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]
where \(X_i\) is the as variable in the mtcars dataset. Simple Least Squares regression will give the average mpg for automatics (\(X = 0\)) and \(\beta_1\) will give the added average miles per gallon of a manual transmission. Looking at just the transmission, we can see that automatic transmission vehicles have an average of 17.15 miles per gallon, and manuals get an additional 7.24 miles per gallon on average (24.39 total).
model <- lm(mpg ~ am, data=mtcars)
summary(model)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am 7.244939 1.764422 4.106127 2.850207e-04
Before jumping to the conclusion that manual transmissions are more fuel efficient, we should try to control for other variables in the dataset. \[\tiny(2) \normalsize \hspace{1cm} Y_i = \beta_0 + \beta_1 X_i + \sum \beta_j X_ij + \epsilon_i\]
where \(\beta_1\) is the coefficient for the dummy variable as and the sum of \(\beta_j\) and \(X_j\) includes all other variables.
model <- lm(mpg ~ ., data=mtcars)
summary(model)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
Using the AIC function in R, we systematically removed variables, one at a time, until we could minimize the AIC output. Below are a subset of the models reviewed:
AIC(lm(mpg ~ wt + cyl + disp + hp + drat + qsec + vs + am + gear + carb,data=mtcars))
## [1] 163.7098
AIC(lm(mpg ~ wt + disp + hp + drat + qsec + am + gear + carb,data=mtcars))
## [1] 159.7853
AIC(lm(mpg ~ wt + disp + hp + drat + qsec + am,data=mtcars))
## [1] 156.2687
AIC(lm(mpg ~ wt + qsec + am,data=mtcars))
## [1] 154.1194
#Removing any more raises the AIC
AIC(lm(mpg ~ qsec + am,data=mtcars))
## [1] 175.6022
AIC(lm(mpg ~ wt + am,data=mtcars))
## [1] 168.0292
AIC(lm(mpg ~ wt + qsec,data=mtcars))
## [1] 156.7205
After controlling for weight and the time to drive one 4th of a mile, manual vs. automatic is still a statistically significant predictor of gas mileage at the 5%, but not at the 1% level, and furthermore, the magnitude is less than the effect of weight once you control for the other two variables. Our final model is \[\tiny(3) \normalsize \hspace{1cm} Y_i = \beta_0 + \beta_1 wt_i + \beta_2 qsec_i + \beta_3 am_i + \epsilon_i\]
Where wt is the car weight (in 1000 lbs), qsec is the time to drive one quarter mile, and am is a dummy variable where 0=automatic and 1=manual. This is only one of several possible model selection methods, and is notably limited in that it does not account for possible interaction terms or power models.
model<-lm(mpg ~ wt + qsec + am,data=mtcars)
summary(model)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## am 2.935837 1.4109045 2.080819 4.671551e-02
The validity of a linear regression model requires that the variance of the residuals be constant and that the residuals themselves are not closely correlated to the predictor variables used in the equation. We tested these assumptions using a resitual plot and by regression the residuals against the predictor variables.
qplot(x=mtcars$wt,y=model$resid) + ylim(-5,5) + xlab("Car Weight") + ylab("Residual") + ggtitle("Residual Plot for Regression Equation (3)")
summary(lm(model$resid~wt+qsec+am,data=mtcars))
##
## Call:
## lm(formula = model$resid ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.561e-15 6.960e+00 0 1
## wt 2.661e-16 7.112e-01 0 1
## qsec 1.422e-16 2.887e-01 0 1
## am 5.096e-16 1.411e+00 0 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 2.778e-32, Adjusted R-squared: -0.1071
## F-statistic: 2.592e-31 on 3 and 28 DF, p-value: 1
Thus we can conclude there is no heterskedasticity and no issues with a non-constant residual variance.
After controlling for other factors and validating the model using the Akaike Information Criterion (AIC), we conclude that manual transmission vehicles do, on average, have a better gas mileage by 2.94 mpg at the 5% confidence level. This relationship between mpg, transmission, and weight is very well illustrated by the following graph.
q<-qplot(x=wt,y=mpg,data=mtcars,colour=as.factor(am))
q<-q + scale_color_discrete(name="Transmission",labels=c("Automatic","Manual"))
q<-q + ggtitle("Miles per Gallon by Weight and Transmission")
q<-q + xlab("Weight (1000lbs)") + ylab("Miles Per Gallon")
q
So we can see clearly that automatics tend to be both heavier and have a lower gas mileage, but comparibly weighted manual transmission vehicles still tend to have slightly better gas mileage, as seen with equation (3).