Executive Summary

We will examine the mtcars data with a goal to fit a linear model to determine if automatic or manual transmission is better for fuel efficiency (“mpg”) and we will quantify the difference should it exist.

Data Exploration and Preparation

data(mtcars); mtcars$am <- factor(mtcars$am); mtcars$vs <- factor(mtcars$vs); 
mtcars$gear <- factor(mtcars$gear); mtcars$cyl <- factor(mtcars$cyl); mtcars$carb <- factor(mtcars$carb)

The data includes mileage (mpg), number of cylinders (cyl), displacement (disp), horse power (hp), rear axle ratio (drat), weight (wt), 1/4 mile time (qsec), cylinder arrangement (vs) , transmission type (am), number of forward gears (gear) and number of carburetors (carb) for 32 different vehicle models. The histogram (Figure 1) shows boxplots of the mpg by transmission type. At a glance, there appears to a difference in median mpg in automatic versus manual transmissions. We will explore this further by checking a regression on this factor.

Modelling

We begin by checking the simplest linear model a regression mpg on transmission type (am). From the \(R^{2}\), we see that this model does a poor job of explaining the variance in the data around its mean. We will use the step() function to find a model with an improved \(R^{2}\) value. The step() function will indicate the best fit using the AIC method. It will check each variable in the model to find one with the best fit according to the AIC value.

mpg_fit1 <- lm(mpg ~ am, mtcars)
summary(mpg_fit1)$r.squared
## [1] 0.3597989
best <- step(lm(mpg~.,mtcars), direction = "both")

The model returned by the step() function (above), gives an Adjusted \(R^{2}\) indicating that this model explains 83% of the variance of the data around its mean, so this model is much stronger than the earlier fit. Also, by comparing the residual plots in Figure 2 and Figure 3, we can see there is more heteroskedacity in this model. Figure 3 indicates several outliers Chrysler Imperial, Fiat 128 and Toyota Corolla which we may wish to exclude.

summary(best)$call
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
mpg_fit2 <- lm(mpg ~ wt + qsec + am, mtcars)
summary(mpg_fit2)$adj.r.squared
## [1] 0.8335561

We will compare our results. Using anova() we can see that all of the coefficients are statistically significant compared to the model that does not include them.

mpg_fit1.a <- lm(mpg ~ am + qsec, mtcars)
anova(mpg_fit1, mpg_fit1.a, mpg_fit2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + qsec
## Model 3: mpg ~ wt + qsec + am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     29 352.63  1    368.26 60.911 1.679e-08 ***
## 3     28 169.29  1    183.35 30.326 6.953e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusions

summary(mpg_fit2)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am1          2.935837  1.4109045  2.080819 4.671551e-02

It is clear that a manual transmission contributes to lower gas mileage. The Figure 1 boxplot suggests this before the model is developed. However, the am1 factor is significant at \(\alpha = .05\) so this confirms the relationship. According to the model we developed when weight (wt) and 1/4 mile time (qsec) are held constant, we can expect an average of 2.9358 increase in mpg when using a manual transmission. We note that this number is significant with 95% confidence which is acceptable but we other coefficients in this model have a greater significance level.

Appendix

library(ggplot2)
g <-  ggplot(mtcars, aes(x = am, y = mpg)) + geom_boxplot(aes(fill = am)) +
        labs(title = "Figure 1: MPG in Automatic and Manual Transmission Vehicles", 
             x = "Transmission Type") + 
        scale_x_discrete(labels = c("Automatic", "Manual")) +
        scale_fill_discrete("Type", labels = c("Automatic","Manual"))
g

plot(mpg_fit1, which = 1, main = "Figure 2: Residual Plot fit1")

plot(mpg_fit2, which = 1, main = "Figure 3: Residual Plot fit2")