Executive Summary

The purpose of this report is to examine the effect (if any) of transmission type (automatic vs. manual) on fuel economy. To do this we examine the mtcars data available in the datasets R library. This data set was originally published in the 1974 Motor Trend US magazine that covers fuel economy and 10 properties relating to vehicle design and performance for 32 1973-74 model automobiles. We will perform some exploratory data analysis, fit linear models, plot residuals and diagnostics, and supply data-driven conclusions on the effect of transmission type on fuel economy.

Exploratory Data Analysis

Let us plot the transmission type against fuel economy. In Fig. A1, found in the Appendix, we show just transmission type against fuel economy. In Fig. A2 we show how (visually) including the number of cylinders can dramatically affect the strength of that relationship. From this, we know we must build multiple models in order to determine which variables to include in the final model from which we will draw conclusions.

Modeling

We will now perform Nested Model Testing to determine which covariates to include in the final model. To do this, we will build models beginning with a single regressor, adding in the covariates with each subsequent model.

library(datasets); library(ggplot2)
data(mtcars)        # load the data

# rename values in transmission type to more descriptive value
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am)[levels(mtcars$am)==0] <- "Automatic"
levels(mtcars$am)[levels(mtcars$am)==1] <- "Manual"

# build nested models, adding variables one at a time
fit1 <- lm(mpg ~ am, data=mtcars)
fit2 <- update(fit1, mpg ~ am + disp)
fit3 <- update(fit2, mpg ~ am + disp + hp)
fit4 <- update(fit3, mpg ~ am + disp + hp + drat)
fit5 <- update(fit4, mpg ~ am + disp + hp + drat + wt)
fit6 <- update(fit5, mpg ~ am + disp + hp + drat + wt + qsec)
fit7 <- update(fit6, mpg ~ am + disp + hp + drat + wt + qsec + vs)
fit8 <- update(fit7, mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl)
fit9 <- update(fit8, mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl + gear)
fit10 <- update(fit9, mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl + gear + carb)

# run and printout analysis of variance for purpose of covariate model selection
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8, fit9, fit10)
## Analysis of Variance Table
## 
## Model  1: mpg ~ am
## Model  2: mpg ~ am + disp
## Model  3: mpg ~ am + disp + hp
## Model  4: mpg ~ am + disp + hp + drat
## Model  5: mpg ~ am + disp + hp + drat + wt
## Model  6: mpg ~ am + disp + hp + drat + wt + qsec
## Model  7: mpg ~ am + disp + hp + drat + wt + qsec + vs
## Model  8: mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl
## Model  9: mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl + gear
## Model 10: mpg ~ am + disp + hp + drat + wt + qsec + vs + cyl + gear + carb
##    Res.Df RSS Df Sum of Sq     F  Pr(>F)    
## 1      30 721                               
## 2      29 300  1       421 59.89 1.4e-07 ***
## 3      28 226  1        74 10.56  0.0038 ** 
## 4      27 221  1         5  0.72  0.4055    
## 5      26 176  1        45  6.46  0.0190 *  
## 6      25 150  1        26  3.64  0.0701 .  
## 7      24 149  1         1  0.09  0.7648    
## 8      23 149  1         1  0.08  0.7776    
## 9      22 148  1         1  0.14  0.7137    
## 10     21 147  1         0  0.06  0.8122    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will include any variable featuring a significance code from the above chart. This tells us that we should include the disp, hp, wt, and qsec variables in our regression model in addition to am (transmission type). From

Diagnostics

Fig. A3, the plots show that there is little-to-no pattern in the residuals. Two data points have higher leverage; however, they both have small Cook’s Distance and shall be retained.

library(car)

# fit the model for moving forward
fit <- lm(mpg ~ am + disp + hp + wt + qsec, data=mtcars)

# compute Variance Inflation Factors diagnostic
vif(fit)
##    am  disp    hp    wt  qsec 
## 2.887 9.072 5.195 7.171 3.791

The Variance Inflation Factors from the multivariate linear model show that there is probably much correlation between our regressors. This makes sense when considering cars with more displacement (size of engine) probably have more horsepower, and so on. We will nevertheless include them as we need to account for this when examining transmission types’ effect on fuel economy.

Hypothesis Testing and Conclusions

We set our null hypothesis to be that \(H_0:\) There is no effect between automatic and manual transmission on fuel economy, and \(H_\alpha:\) There is an effect of transmission type on fuel economy (keeping all other variables constant). If the 95% confidence interval for manual transmissions (relative to automatic) excludes 0, we reject the null hypothesis.

# store and print the coefficients of the model
sumCoef <- summary(fit)$coefficients
print(sumCoef)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.36190     9.7408   1.474 0.152378
## amManual     3.47045     1.4858   2.336 0.027488
## disp         0.01124     0.0106   1.060 0.298972
## hp          -0.02117     0.0145  -1.460 0.156387
## wt          -4.08433     1.1941  -3.420 0.002075
## qsec         1.00690     0.4754   2.118 0.043908

Now let’s look at the 95% confidence interval for the manual transmission coefficient.

# compute and print the 95% confidence interval for manual transmissions' effect on fuel economy (relative to automatic)
manInt <- sumCoef[2,1] + c(-1,1) * qt(0.975, df=fit$df) * sumCoef[2,2]
print(manInt)
## [1] 0.4164 6.5245

The 95% confidence interval for manual transmissions is entirely greater than 0. This indicates that we successfully reject the null hypothesis and can say manual transmissions (on average) feature a fuel economy increase of between 0.4164 and 6.5245 MPG over automatic transmissions (averaging 3.4705 MPG). The assumptions that went into this study are as follows:

Appendix

Fig. A1

# plot basic transmission type vs fuel economy box plot
g <- ggplot(data=mtcars, aes(am, mpg)) + geom_boxplot(aes(fill=am)) + 
    labs(x="", y="Fuel Economy (MPG)" + theme(legend.position="none"))
print(g)

plot of chunk FigA1

Fig. A2

# add number of cylinders to FigA1 to show influence of other variables
g <- ggplot(data=mtcars, aes(am, mpg)) + geom_boxplot(aes(fill=factor(cyl))) + 
    labs(x="", y="Fuel Economy (MPG)")
print(g)

plot of chunk FigA2

Fig. A3

# plot diagnostic plots
par(mfrow=c(2,2))
plot(fit)

plot of chunk FigA3