Fuel Consumption Analysis : Difference Between Automatic and Manual Transmissions


I. Executive Summary

This document is the final report of the Peer Assessment project from Coursera’s course Regression Models, as part of the Specialization in Data Science. It was built up in RStudio, using its knitr functions, meant to be published in pdf format.
This analysis meant to be a research for Motor Trend, a magazine about the automotive industry. By looking at a dataset of a collection of cars (mtcars), we are interested in exploring the relationship between a set of variables described below and the fuel autonomy in miles per gallon (MPG) as the outcome. We are particularly interested to explore:

  • Is an automatic or manual transmission better for MPG ?
  • What is the MPG difference between automatic and manual transmissions ?

In order to answer these questions, we performed a very quick exploratory data analysis, and then used hypothesis testing and linear regression as methodologies to make the necessary inferences. Both simple and multivariate linear regression analysis (supported by an ANOVA of the variables to be included into the final model) have been used. Using model selection strategy, it has been found out that :

  • For higher MPG, manual transmission is better than automatic. In a simple linear regression model between MPG and transmission, it is observed that cars with manual transmission would travel 7.245 more miles per gallon on average than cars with automatic transmission.
  • When using the multivariable regression analysis that includes other impacting variables (weight - wt - and quarter mile time - qsec), the adjusted model shows that manual transmission cars allowed in reality 2.936 miles per gallon more than automatic transmission (when keeping the other variables constant).


II. Data Loading and Exploratory Analysis

a) Dataset Overview

For the purpose of this analysis we use mtcars dataset which is a dataset extracted from the 1974 Motor Trend US magazine, and comprises fuel autonomy and 10 more aspects of automobile design and performance for 32 automobiles (1973-74 models). The table below shows a brief description of the variables in the dataset:

column variable description unit
[, 1] mpg fuel autonomy miles/US gallon
[, 2] cyl number of cylinders number
[, 3] disp displacement cu.in.
[, 4] hp gross power horsepower
[, 5] drat rear axle ratio ratio
[, 6] wt car weight lb/1000
[, 7] qsec 1/4 mile time seconds
[, 8] vs engine type 0 = V engine, 1 = straight engine
[, 9] am transmission 0 = automatic, 1 = manual
[,10] gear forward gears number
[,11] carb carburetors number

b) Environment Preparation

We first load the R libraries that are necessary for the analysis.

rm(list=ls())                # free up memory for the download of the data sets
setwd("~/Cursos/Data Science/07 Regression Models/Projetos")
library(knitr) 
library(ggplot2)
library(GGally)
library(datasets)
library(MASS)

c) Data Loading

The next step is loading the dataset. Its 6 first rows are shown below.

data(mtcars)
kable(head(mtcars),align = 'c')
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

d) Exploratory Analysis

Data overview

Initially, we have a quick look at the MPG variable for both automatic transmission data and manual. A small boxplot with these numbers is shown below.

trAutom  <- mtcars$mpg[mtcars$am == 0]
trManual <- mtcars$mpg[mtcars$am == 1]
summary(trAutom)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   14.95   17.30   17.15   19.20   24.40
summary(trManual)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.00   21.00   22.80   24.39   30.40   33.90
ggplot(mtcars,aes(y = mpg, x = factor(am), fill=factor(am))) + 
    geom_boxplot() + geom_jitter() +
    ggtitle("Fuel Autonomy (in miles/US gallon)") +
    ylab("mpg") + xlab("transmission type") +
    scale_x_discrete(breaks=NULL) +
    scale_fill_discrete(name="transmission type", labels=c("automatic","manual"))

t Test

In order to check for significant difference on MPG between automatic and manual transmissions (to justify further analyses) it has been performed a t Test with the data.

  1. F Test for equal variances:
var.test(trAutom,trManual)
## 
##  F test to compare two variances
## 
## data:  trAutom and trManual
## F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1243721 1.0703429
## sample estimates:
## ratio of variances 
##          0.3865615

With a p-value of 0.067, we assume that the variances are not equal for the t Test. In fact, when trying both equal or not equal variances, the t Test shows no significant difference in results.

  1. t Test for differences in the averages:
t.test(trAutom, trManual, paired = FALSE, alternative="two.sided", var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  trAutom and trManual
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

The t Test p-value is 0.0014, which shows significant difference between the averages of automatic and manual transmissions (7.245 increased MPG for manual transmission).

Data Correlations

A first glimpse on the correlations of all the variables with MPG is shown in the table below.

ggpairs(mtcars, 
        lower = list(continuous = "smooth",params = c(method = "loess", colour="blue")),
        diag=list(continuous="bar", params=c(colour="blue")),
        upper=list(params=list(corSize=20)), axisLabels='show')

Most of the variables show some impact on MPG. For that reason, it is advisable to run an ANOVA to separate the ones that are really impacting MPG.


III. Regression Analysis

a) Linear Regression

A first Linear Regression Analysis, using only MPG and transmission type (am) as variables was made to show the impact of transmission on MPG witout taking into account the other variables.

trLM <- lm(mpg ~ am, data = mtcars)
summary(trLM)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## am           7.244939   1.764422  4.106127 2.850207e-04

As said before, it shows a big difference in MPG favorable to manual transmission (+ 7.245 miles per gallon) when the other variables are not considered.
By looking at the correlations table, it is easy to see that there are other variables also impacting on MPG and a Multivariable Regression Analysis is then performed below.

b) Multivariable Regression

Including all variables we have:

trMVAR <- lm(mpg ~ . , data = mtcars)
summary(trMVAR)$coefficients
##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
## cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
## disp         0.01333524  0.01785750  0.7467585 0.46348865
## hp          -0.02148212  0.02176858 -0.9868407 0.33495531
## drat         0.78711097  1.63537307  0.4813036 0.63527790
## wt          -3.71530393  1.89441430 -1.9611887 0.06325215
## qsec         0.82104075  0.73084480  1.1234133 0.27394127
## vs           0.31776281  2.10450861  0.1509915 0.88142347
## am           2.52022689  2.05665055  1.2254035 0.23398971
## gear         0.65541302  1.49325996  0.4389142 0.66520643
## carb        -0.19941925  0.82875250 -0.2406258 0.81217871

We may observe that all variables have p-values higher than 0.05, which shows that all of them have some sort of impact on MPG.
To separate the ones that are really impacting, an ANOVA (using MASS package stepAIC function) is performed.

c) Model Fitting

fitModel <- stepAIC(lm(mpg ~ . ,data=mtcars), direction = 'both', trace = FALSE)
fitModel
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt         qsec           am  
##       9.618       -3.917        1.226        2.936

According to the analysis above, the most impacting variables on MPG, besides transmission type (am), are the weight of the car (wt) and quarter mile time (qsec).
This means that other variables are less significant than those two or that the correlation among variables allows us to choose only those, minimizing the deviations (variances) in the final model.

d) Final Model

The final model, including the relationship among MPG and transmission (am), weight (wt) and quarter mile time (qsec) is:

finalModel <- lm(mpg ~ factor(am) + qsec + wt, data = mtcars)
summary(finalModel)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## factor(am)1  2.935837  1.4109045  2.080819 4.671551e-02
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06

In this model, we see a reduced impact of transmission on MPG, closer to reality. If the other variables are kept constant, the new impact of transmission on MPG would be only 2.936 miles per gallon (in average), favorable to the manual transmission.

d) Residuals Analysis

par(mfrow = c(2, 2))
plot(finalModel)

There are no significant visual trends on the residuals of the final model, and it can be observed good normality pattern. These allow us to conclude that the model could be validated.


IV. Conclusions

As conclusions of the analysis above, we reinforce that:

  • Manual transmission is better fuel autonomy MPG than the automatic (+2.936 miles per gallon favorable to manual).
  • The final model for MPG considering the most impacting variables is: mpg = 9.618 - 3.917 wt + 1.226 qsec + 1.4109 am
  • in this sense, for the same weight (wt) and quarter mile time (qsec), manual transmission cars get 2.936 miles per gallon more than automatic transmission cars.