Executive Summary

In this article, we will explore the question of whether automatic or manual transmission is better for fuel efficiency, according to the data provided in an issue of Motor Trend magazine in 1974, for 32 automobiles in their respective 1973-1974 models. The finding using this data set is that vehicles with manual transmission hold a slight but insignificant edge in terms of fuel effiency.

Exploratory Data Analysis

We will attempt to develop a multivariate regression model, with the independent variable being mpg (miles per gallon), and the dependent variables being the design features of automobiles in the mtcars dataset. A pairs plot is generated in Figure A1. The lower panel displays the scatter plot, between each pair of the variables, the upper panel displays the correlation between each pair of the variables.

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1)); r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(mtcars, lower.panel = panel.smooth, upper.panel = panel.cor)
Figure A1 - Pairs Plot of the Columns of mtcars with Scatter and Correlation


Observing the figure, we can see that the variables that have a high correlation with mpg are wt (weight, at 87%), cyl (number of cylinders, at 85%), disp (engine displacement, at 85%), and hp (horsepower, at 78%). We also see that cyl and disp are highly correlated, at 90%.

Model Selection

We start off with mpg as the dependent variable and am as the first regressor, since the goal is to compare automatic transmission to manual transmission. There is no need to cast am as a factor since that the difference between manual (1) and automatic (0) is exactly 1, leading to the ease of interpretability of the coefficient for am.

The second model would include wt as an additional regressor, due to its high correlation with mpg as seen in the pairs plot. The third model would add hp as another regressor.

The other candidates for regressors that we like to include are cyl and disp. However, recall that the two features themselves are highly correlated. This makes sense because engine displacement measures capacity, which is dictated by the number of cylinders. To avoid multicollinearity, we would only need to include one of these two variables in the regression analysis. Since disp is a more specific quantitative measurement than cyl, we choose disp as the last independent variable to be used in the regression.

model1 <- lm(mpg ~ am, data = mtcars)
model2 <- lm(mpg ~ am + wt, data = mtcars)
model3 <- lm(mpg ~ am + wt + hp, data = mtcars)
model4 <- lm(mpg ~ am + wt + hp + disp, data = mtcars)

Next, the nested likelihood ratio test is used to compare how much each of the models improves upon the previous one.

anova(model1, model2, model3, model4)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt
## Model 3: mpg ~ am + wt + hp
## Model 4: mpg ~ am + wt + hp + disp
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     29 278.32  1    442.58 66.4206 9.394e-09 ***
## 3     28 180.29  1     98.03 14.7118 0.0006826 ***
## 4     27 179.91  1      0.38  0.0576 0.8122229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that model2 and model3 show significant improvements, and model4 does not, since the p-value of the ratio test is 0.81, which too high to indicate siginificance. Therefore, model3 is our final model. The diagnostic plots of the model are found in Figure A2.

model3 <- lm(mpg ~ am + wt + hp, data = mtcars)
par(mfrow=c(2,2))
plot(model3)
Figure A2 - Residual vs Fitted, Normal Q-Q, Scale-Location, Residuals vs Leverage


Regression Summary, Inference, and Conclusion

Finally, we will output the summary of the chosen model and the confidence interval.

smry <- summary(model3); smry
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## am           2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11
smry$coef[2,1] + c(-1, 1) * qt(.975, df = model3$df) * smry$coef[2, 2]
## [1] -0.7357587  4.9031790

This model conveys that manual transmission increases the fuel efficiency by 2.16 miles per gallon for vehicles of the same weight and horsepower, with a 95% confidence interval of [-0.74, 4.90] miles per gallon. Note that the lower bound is below zero, signifying that the am coefficient is not significant.

Additional diagnostic plots are given in Figure A3 and Figure A4. The plots show that Maserati Bora (manual, 15.0 mpg) has the highest leverage in the data set in fitting the model, and that taking Chrysler Imperial (automatic, 14.7 mpg) out of the data set would result in the greatest change in the coefficient for am in deriving mpg.

hatvalues <- hatvalues(model3)
dfbetas <- dfbetas(model3)
dfbetas.am <- dfbetas[,2]

rona <- names(hatvalues)
names(hatvalues) <- ifelse(mtcars$am == 1, paste(rona, '(M)'), paste(rona, '(A)'))
names(dfbetas.am) <- ifelse(mtcars$am == 1, paste(rona, '(M)'), paste(rona, '(A)'))

par(mai=c(0.5,2.5,0.5,1))
barplot(hatvalues, horiz=T, las=1, cex.names = 0.7, xlab='hatvalues', col='red')
Figure A3 - hatvalues for Each Automobile


par(mai=c(0.5,2.5,0.5,1))
barplot(dfbetas.am, horiz=T, las=1, cex.names = 0.7, xlab='dfbetas for am', col='blue')
Figure A4 - dfbetas of Automatic/Manual Coefficient for Each Automobile