Which offers the best fuel economy (and by how much), an automatic or manual transmission car, is one of the most debated question among car owners. In this study, we investigated the relationship of fuel economy in terms of miles per gallon (mpg) with the car transmission type (automatic vs. manual) using the mtcars data extracted from the 1974 Motor Trend US magazine.
A multivariate linear regression model containing the weight, acceleration and transmission types of the car with the respective interaction terms was selected. Analysis shows that there is no significant relationship between the transmission type of cars and its fuel efficiency.
However, it is interesting to note that the weight of the car has a bigger negative effect on fuel efficiency in manual transmission cars (reduces fuel efficiency by -6.75 Miles/US Gallon for every 1000lbs increase in weight) compared to automatic transmission cars (reduces fuel efficiency by -3.00 Miles/US Gallon for every 1000lbs increase in weight). The acceleration of automatic transmission cars also have a negative effect on its’ fuel efficiency (for every additional second required for automatic transmission cars to reach a quarter Mile from standstill, the fuel efficiency will increase by 0.95 Miles/US Gallon).
Load the mtcars dataset. Detailed description of the mtcars dataset can be found here.
library("datasets")
data("mtcars")
A quick view of the mtcars dataset.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Prepare the mtcars dataset for analysis.
mtcars$vs <- as.factor(as.character(mtcars$vs))
mtcars$am <- as.factor(as.character(mtcars$am))
A correlation matrix is generated from the dataset for a quick visualization and inspection of the data. The number of forward gears (gear) and carburetors (carb) is removed to prevent excessive cluttering of the correlation matrix as these two variables are unlikely to provide any interesting insights with the transmission type (am) and fuel efficiency (mpg).
library("GGally")
ggpairs(data=mtcars[,1:9], lower = list(continuous ="smooth"))
According to the correlation matrix, A high degree of colinearity between variables is evident.
For the purpose of this study, we will analyse the data by constructing two different models using GLM. Model A will be constructed solely using stepwise regression with Akaike Information Criterion (AIC). As the variable of interest, am (transmission type), is a factor variable, we will construct Model B using the predictors selected by Model A and including interactions terms.
library("MASS")
library("car")
fit_A <- lm(mpg~.,mtcars)
model_A <- stepAIC(fit_A, direction = "both", trace =0)
summary(model_A)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am1 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
extractAIC(model_A)
## [1] 4.0000 61.3073
Some diagnostics on Model A.
par(mfrow=c(2,2))
plot(model_A)
outlierTest(model_A)
##
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferonni p
## Chrysler Imperial 2.323119 0.027949 0.89437
model_B <- lm(mpg~(wt+qsec)*am, data=mtcars)
summary(model_B)
##
## Call:
## lm(formula = mpg ~ (wt + qsec) * am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6832 -1.3222 -0.3747 1.0687 4.0907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.2489 6.9922 1.609 0.119738
## wt -2.9963 0.6910 -4.336 0.000194 ***
## qsec 0.9454 0.3067 3.082 0.004813 **
## am1 8.9265 12.6662 0.705 0.487232
## wt:am1 -3.7581 1.5158 -2.479 0.019969 *
## qsec:am1 0.2355 0.5566 0.423 0.675630
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.117 on 26 degrees of freedom
## Multiple R-squared: 0.8966, Adjusted R-squared: 0.8767
## F-statistic: 45.07 on 5 and 26 DF, p-value: 5.394e-12
extractAIC(model_B)
## [1] 6.00000 53.34182
Some diagnostics on Model B.
par(mfrow=c(2,2))
plot(model_B)
outlierTest(model_B)
##
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferonni p
## Fiat 128 2.384767 0.024991 0.79972
Now let’s evaluate the two models, Model A and Model B. Model A have 3 predictors, qsec which is a measure of acceleration, wt is the weight of the car, and am which is the transimission type of the vehicle. Intuitively, we would think that hp (horsepower), drat (rear axle ratio) and disp (displacement) would also affect the fuel efficiency of the car. However, as these predictors are also highly correlated of qsec (a derivative of these variables), the selection of qsec, wt and am is reasonable based on the mtcars dataset.
An ANOVA is also performed on the two models.
anova(model_A, model_B)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt + qsec + am
## Model 2: mpg ~ (wt + qsec) * am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169.29
## 2 26 116.47 2 52.812 5.8945 0.007743 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model B is a significantly better fit than Model A.
Beside that, by reviewing and comparing the AIC, adjusted R-squared, and residual diagnostics of both models, it is suggested that Model B is a better model than Model A.
According to Model B, the following interpretations can be made:
The effect of the type of transmission (am) of a car on the fuel efficiency (mpg) is not statistically significant when the weight (wt) and the acceleration (qsec) is hold constant.
The weight of the car (wt) has a significant negative effect on fuel efficiency (mpg), with a bigger effect on manual transmission cars (reduces fuel efficiency by -6.75 Miles/US Gallon for every 1000lbs increase in weight) compared to automatic transmission cars (reduces fuel efficiency by -3.00 Miles/US Gallon for every 1000lbs increase in weight).
The acceleration of the car (qsec) only has a significant effect on fuel efficiency (mpg) for automatic transmission cars. For every additional second required for automatic transmission cars to reach a quarter Mile from standstill, the fuel efficiency will increase by 0.95 Miles/US Gallon.
Based solely on the analysis of the mtcars dataset, there is no significant relationship between a car’s transmission type (automatic or manual) and fuel efficiency. However, it is interesting to note that the weight of the car has a bigger negative effect on fuel efficiency in manual transmission cars compared to automatic transmission cars. The acceleration of automatic transmission cars also have a negative effect on its’ fuel efficiency.
Unfortunately, due to the limited number of observations in the dataset (n=32), we are unable to split the dataset to perform any meaningful cross validations on the models. As such, the models selected may be overfitted or underfitted. Besides that, the limited sample size may also reduce the statistical power of the models. Some predictors that may be significant in larger sample size data may not be significant in this limited sample size dataset. Due to this limitation, we are unable to quantify any significant differences in the car fuel efficiency of the two different transmission types.