An investigation of the association between transmission and MPG

Research Question 1: Is an automatic or manual transmission better for MPG

Research Question 2: Quantify the MPG difference between automatic and manual transmissions

Summary

The manual transmission car in 1974 is better on mpg than the automatic The mpg difference is if other coefficients held constant, for every unit increase of mpg, the manual car is higher than the automatic car in 1974, given the best fitted model.

In this report, we explored the association of transmission to mpg in 1974 trend magazine dataset, which includes 32 cars. To answer the first question, we ploted a boxplot (see appendix) to compare the means of mpg between auto and manual. It illustrated that manual cars have a higher mpg of 7.2449393 than the auto.

To tackle the second question, we tended to fit a parsimonious model first by using backward elimination method, where we constructed first a full model, then removed the predictor that had the highest p-value before we fitted the model again;we repeated this method until all the predictors had lowest p-value less than the critical value 0.05.

Appendix

  1. load data and libraries
data(mtcars)
library(ggplot2)
library(dplyr)

2.Exploratory analysis

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.42   19.20   20.09   22.80   33.90
mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$carb <- as.factor(mtcars$carb)

# look for correlation
pairs(mtcars, panel = panel.smooth)

# mpg difference between manual and auto transmission
diff(tapply(mtcars$mpg, mtcars$am, mean))
##        1 
## 7.244939
boxplot(mpg ~ am, data = mtcars, main = "auto vs manual on mpg", xlab = "transmission( 0 : auto, 1: manual)", ylab = "mpg" )

The paired plot showed that there are correlations of am versus vs, gear, carb. The boxplot revealed that manual cars have higher mpg than automatic cars.

Conclusion for Question 1: In 1973 to 1974 models of 32 cars, manual transmission is generally better for mpg compared with automatic transmission.

  1. select models
# Backward Elimination

mdl_whole <- lm(mpg ~., data = mtcars)
summary(mdl_whole)$coef
##                Estimate  Std. Error     t value   Pr(>|t|)
## (Intercept) 23.87913244 20.06582026  1.19004018 0.25252548
## cyl6        -2.64869528  3.04089041 -0.87102622 0.39746642
## cyl8        -0.33616298  7.15953951 -0.04695316 0.96317000
## disp         0.03554632  0.03189920  1.11433290 0.28267339
## hp          -0.07050683  0.03942556 -1.78835344 0.09393155
## drat         1.18283018  2.48348458  0.47627845 0.64073922
## wt          -4.52977584  2.53874584 -1.78425732 0.09461859
## qsec         0.36784482  0.93539569  0.39325050 0.69966720
## vs1          1.93085054  2.87125777  0.67247551 0.51150791
## am1          1.21211570  3.21354514  0.37718957 0.71131573
## gear4        1.11435494  3.79951726  0.29328856 0.77332027
## gear5        2.52839599  3.73635801  0.67670068 0.50889747
## carb2       -0.97935432  2.31797446 -0.42250436 0.67865093
## carb3        2.99963875  4.29354611  0.69863900 0.49546781
## carb4        1.09142288  4.44961992  0.24528452 0.80956031
## carb6        4.47756921  6.38406242  0.70136677 0.49381268
## carb8        7.25041126  8.36056638  0.86721532 0.39948495
mdl <- lm(mpg ~. - cyl - carb - gear - vs -drat - disp - hp, data = mtcars)
summary(mdl)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am1          2.935837  1.4109045  2.080819 4.671551e-02

model = mpg ~ wt + qsec + am1 is the most parsimonious model after backward elimination of greater p values.

  1. residual plot
res <- resid(mdl)
qqnorm(res)
qqline(res)

hist(res, breaks = 5)

The residuals of the fitted model seem like skewed to the right. Under this model, the mpg difference is if other coefficients held constant, for every unit increase of mpg, the manual car is 2.9358372 mpg higher than the automatic car in 1974.