Executive Summary

This document explores the relationship between a set of variables and miles per gallon (mpg) in R Dataset’s \(mtcars\) , trying to answer the following questions:

The best formulation to explain mpg, according to \(AIC\), and given the mtcars dataset, is \[ mpg = 9.62 - 3.92 * wt + 1.23 * qsec + 2.94 * am1\]

It relates mpg to vehicle weight (\(wt\)), performance (\(qsec\) - 1/4 mile time), and transmission type (\(am\) - 0 = automatic, 1 = manual).

Everything else equal (or “if considering a similar vehicle”), a manual transmission gives, on average, almost 3 miles per gallon more than an automatic one. In 95% of cases this figure is between 0.05 and 5.8 mpg.

Introduction

The \(mtcars\) dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The \(am\) variable gives the transmission type. I’m factorizing the \(am\) variable, due to its binary nature, for interpretability of results.

data(mtcars, package = "datasets")
mtcars$am <- as.factor(mtcars$am)
mean0 <- round(mean(mtcars$mpg[mtcars$am == "0"]),1)
mean1 <- round(mean(mtcars$mpg[mtcars$am == "1"]),1)

A quick exploratory graph (fig.1 in the appendix) shows the raw mpg response to transmission type. Given this sample, manual transmissions (24.4 mpg) provide larger mpg than automatic transmissions (17.1 mpg).

Strategy for Model Selection

I make use of the \(step\) formula available in R package MASS, which helps choose a model by referring to AIC. Starting from a model that considers all available explanatory variables, this algorithm iteratively computes the AIC value for every nested model (fitting multiple models) by removing one of its variables at a time, and proposing a new model that, by omitting the variable that entails the biggest relative quality gain, puts forward a better model, until it is not possible to omit any variable to get a better result.

lm1 <- lm(mpg ~ ., data = mtcars)
slm1 <- step(lm1)

I intentionally hidden the results from the \(step\) run (due to report’s length concerns - refer to appendix), but below I’m providing the best formulation given by \(step\) to explain mpg (with the best relative quality according to \(AIC\)).

summary(slm1)$call
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)

According to this algorithm, the best way to explain \(mpg\) in the mtcars dataset is by relating it to vehicle weight (\(wt\)), vehicle performance (\(qsec\) - 1/4 mile time), and transmission type (\(am\)).

Model Quality Diagnosis

Regarding the quality of the model, refer to figure2 in the appendix:

Model Results - interpreting the coefficients

summary(slm1)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am1          2.935837  1.4109045  2.080819 4.671551e-02
round(summary(slm1)$coefficients[4,1] +c(1,-1)*qt(.975, summary(slm1)$df[2])*summary(slm1)$coefficients[4,2],3)
## [1] 5.826 0.046

Holding weight and performance constant, the average difference in miles per gallon from a manual to a automatic transmission is equal to the am1 estimate (2.94 mpg). The last two given values are the 95% confidence interval. As a final remark, higher weight is not good for mpg, and neither is speed, as the signs of their coefficients significantly highlight!

Appendix

The impact of Automatic vs. Manual transmission trains on Vehicle Consumption

based on the mtcars data in the R Datasets package

Figure 1

data(mtcars, package = "datasets")
mtcars$am <- as.factor(mtcars$am)
plot(mpg  ~ am, mtcars)

Figure 2

lm1 <- lm(mpg ~ ., data = mtcars)
slm1 <- step(lm1)
par(mfrow = c(2,2))
plot(slm1)

The Akaike information criterion (AIC) run in R - the \(step\) function

lm1 <- lm(mpg ~ ., data = mtcars)
slm1 <- step(lm1)
## Start:  AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1    0.0799 147.57 68.915
## - vs    1    0.1601 147.66 68.932
## - carb  1    0.4067 147.90 68.986
## - gear  1    1.3531 148.85 69.190
## - drat  1    1.6270 149.12 69.249
## - disp  1    3.9167 151.41 69.736
## - hp    1    6.8399 154.33 70.348
## - qsec  1    8.8641 156.36 70.765
## <none>              147.49 70.898
## - am    1   10.5467 158.04 71.108
## - wt    1   27.0144 174.51 74.280
## 
## Step:  AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.2685 147.84 66.973
## - carb  1    0.5201 148.09 67.028
## - gear  1    1.8211 149.40 67.308
## - drat  1    1.9826 149.56 67.342
## - disp  1    3.9009 151.47 67.750
## - hp    1    7.3632 154.94 68.473
## <none>              147.57 68.915
## - qsec  1   10.0933 157.67 69.032
## - am    1   11.8359 159.41 69.384
## - wt    1   27.0280 174.60 72.297
## 
## Step:  AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - carb  1    0.6855 148.53 65.121
## - gear  1    2.1437 149.99 65.434
## - drat  1    2.2139 150.06 65.449
## - disp  1    3.6467 151.49 65.753
## - hp    1    7.1060 154.95 66.475
## <none>              147.84 66.973
## - am    1   11.5694 159.41 67.384
## - qsec  1   15.6830 163.53 68.200
## - wt    1   27.3799 175.22 70.410
## 
## Step:  AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  1     1.565 150.09 63.457
## - drat  1     1.932 150.46 63.535
## <none>              148.53 65.121
## - disp  1    10.110 158.64 65.229
## - am    1    12.323 160.85 65.672
## - hp    1    14.826 163.35 66.166
## - qsec  1    26.408 174.94 68.358
## - wt    1    69.127 217.66 75.350
## 
## Step:  AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1     3.345 153.44 62.162
## - disp  1     8.545 158.64 63.229
## <none>              150.09 63.457
## - hp    1    13.285 163.38 64.171
## - am    1    20.036 170.13 65.466
## - qsec  1    25.574 175.67 66.491
## - wt    1    67.572 217.66 73.351
## 
## Step:  AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.629 160.07 61.515
## <none>              153.44 62.162
## - hp    1    12.572 166.01 62.682
## - qsec  1    26.470 179.91 65.255
## - am    1    32.198 185.63 66.258
## - wt    1    69.043 222.48 72.051
## 
## Step:  AIC=61.52
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - hp    1     9.219 169.29 61.307
## <none>              160.07 61.515
## - qsec  1    20.225 180.29 63.323
## - am    1    25.993 186.06 64.331
## - wt    1    78.494 238.56 72.284
## 
## Step:  AIC=61.31
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              169.29 61.307
## - am    1    26.178 195.46 63.908
## - qsec  1   109.034 278.32 75.217
## - wt    1   183.347 352.63 82.790