Motor Trend Car Road Tests - Effects of Transmission Types on MPG

Author: Ken Ho

Synopsis

In this project, we analyze the mtcars dataset and explore the linear relationship between a set of variables and miles per gallon.

The main objectives of this research are as follows:

  • Is an automatic or manual transmission better for MPG.
  • Quantifying how different is the MPG between automatic and manual transmissions.

The results of this analysis are:

  • Manual transmission is better than automatic transmission for MPG.
  • Manual transmission is 2.94 mpg more fuel efficient than automatic transmission while holding other regressors constant.

Exploratory Data Analyses

t <- t.test(mpg ~ am, data = mtcars, paired = FALSE, var.equal = FALSE)
t
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

The exploratory data analysis shows that:

  • The t-test has a P-value of 1.374e-03 (< 5%) and 95% confidence interval of (-11.28 -3.21) for mean(automatic)-mean(manual), we reject the null hypothesis that there is no significant difference in MPG between the two transmission types.
  • The boxplot in the appendix below shows that manual transmission has better MPG than automatic transmission.
  • The Pairs plot in the appendix below shows that there are several variables that have high correlation with mpg.

Model Selection

Single variable linear regression model - Model #1:

model_1 = lm(mpg ~ am, data = mtcars)
summary(model_1)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## am           7.244939   1.764422  4.106127 2.850207e-04
summary(model_1)$adj.r.squared
## [1] 0.3384589

Multivariable linear regression model (Stepwise Regression) - Model #2:

fitAll <- lm(mpg ~ . , data = mtcars)
model_2 <- step(fitAll, direction = "both")    # stepwise regression
summary(model_2)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am           2.935837  1.4109045  2.080819 4.671551e-02
summary(model_2)$adj.r.squared
## [1] 0.8335561

Model #2 obtained from the above computations consists of the variables: “wt” and “qsec” as confounders and “am” as the independent variable.

Compare the Adjusted R-squared values of the two models:

##         Adjusted R-squared
## model_1             0.3385
## model_2             0.8300

Now, let’s look at the Analysis of Variance Table of the models:

varTbl <- anova(model_1, model_2)
varTbl
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ wt + qsec + am
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1     30 720.90                                 
## 2     28 169.29  2    551.61 45.618 1.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, the VIF (variance inflation factors) of model #2:

vif(model_2)
##       wt     qsec       am 
## 2.482952 1.364339 2.541437

Findings:

  • Model #2 shows that the value of Adjusted R-squared is 0.8336; That is, 83.36% of the response variable variation is explained by this linear model, which is way better than that of model #1.
  • Analysis of Variance Table shows that model #2 has P-value of 1.55e-09 (< 5%), which indicates that the confounders are significance.
  • The VIF (variance inflation factors) shows that model #2 has pretty acceptable variance inflation values. Hence model #2 is chosen as the regression model of this analysis.

Residual Plot and Diagnostics

Refer to Residual Plots and Diagnostics sections in Appendix below for more details.

Findings:

  • No systematic patterns or large outlying observations found in the Residual Plots.
  • There are 2-3 observations that show low leverage but have fairly high influence.

Interpretation of Coefficients

For meaningful interpretation of coefficients, we applied centering on both “wt” and “qsec” variables:

fitCentered <- lm(mpg ~ I(wt - mean(wt)) + I(qsec - mean(qsec)) + factor(am), 
                  data = mtcars)
summary(fitCentered)$coef
##                       Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)          18.897941  0.7193542 26.270704 2.855851e-21
## I(wt - mean(wt))     -3.916504  0.7112016 -5.506882 6.952711e-06
## I(qsec - mean(qsec))  1.225886  0.2886696  4.246676 2.161737e-04
## factor(am)1           2.935837  1.4109045  2.080819 4.671551e-02

Interpretation:

  • 18.90 mpg with average weight, average quarter mile time, and automatic transmission (am = 0).
  • 3.92 mpg less fuel efficiency with every kilo-pound of weight increase, while holding other regressors constant.
  • 1.23 mpg more fuel efficiency with every second of quarter mile time increase, while holding other regressors constant.
  • 2.94 mpg more fuel efficiency with manual transmission compared to automatic transmission, while holding other regressors constant.

Quantification of the Uncertainty

Below are confidence levels of intercept and predictors with 95% confidence:

confint(fitCentered)
##                            2.5 %    97.5 %
## (Intercept)          17.42441087 20.371471
## I(wt - mean(wt))     -5.37333423 -2.459673
## I(qsec - mean(qsec))  0.63457320  1.817199
## factor(am)1           0.04573031  5.825944

Conclusions

  • The multivariable regression model was chosen as the regression model of this analysis due to the fact that more response variable variation was explained.
  • Manual transmission is better than automatic transmission for MPG.
  • Manual transmission is 2.94 mpg more fuel efficient than automatic transmission while holding other regressors constant.

Appendix

Exploratory Data Analysis

Boxplot
mtcars2 <- mtcars
mtcars2$txType <- factor(mtcars$am, labels = c("Automatic","Manual"))
ggplot(mtcars2, aes(x = txType, y = mpg, fill = txType)) +
  geom_boxplot() + 
  labs(title = "Miles Per Gallon by Transmission Type", 
       x = "Transmission Type", 
       y = "Miles Per Gallon") +
  scale_fill_discrete(name = "Transmission")

Pairs Plot
g = ggpairs(mtcars, lower = list(continuous = "smooth"))
g

Residual Plots

par(mfrow = c(2, 2))
plot(model_2)

Diagnostics

par(mfrow = c(1, 2))
# Leverage
plot(hatvalues(model_2), main="Leverage")
# Influence
#plot(rstandard(model_2))
plot(rstudent(model_2), main="Studentized Residuals")

par(mfrow = c(1, 2))
plot(dffits(model_2), main="Influence - dffits")
plot(cooks.distance(model_2), main="Influence - Cook's Distance")

par(mfrow = c(1, 2))
plot(dfbetas(model_2)[, 2], main="Influence - dfbetas of \"wt\"")
plot(dfbetas(model_2)[, 3], main="Influence - dfbetas of \"qsec\"")