Executive Summary

We are using the R’s “mtcars” dataset to perform our analysis. For information, the data for this dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The dataset comprises 32 observations on 11 variables. For the purpose of this analysis, we shall use our analysis to derive answers for the following two questions:

  1. “Is an automatic or manual transmission better for MPG”
  2. “Quantify the MPG difference between automatic and manual transmissions”

Analysis

Load the essential libraries & Read the mtcars dataset

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
  #Explore dataset
data(mtcars) 
head(mtcars, 3)#load the first 3 rows; we are interested in "mpg" and "am" columns
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
str(mtcars) #note that all data are numeric by default
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Process the data

we note that “am” column captures Transmission data (which denotes 0= automatic, and 1= manual). However, this variable is recorded as “numeric” in the dataset. We need to convert this to a factor variable to facilitate our analysis.

  #create a copy of the dataset (in case we need to load it)
mtcars_copy <- mtcars

  # change "am" from numeric to factor variable
mtcars$am <- factor(mtcars$am, labels=c("Automatic", "Manual"))
head(mtcars, 4)#load the first 4 rows to confirm that label changes have been made
##                 mpg cyl disp  hp drat    wt  qsec vs        am gear carb
## Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0    Manual    4    4
## Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0    Manual    4    4
## Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1    Manual    4    1
## Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1 Automatic    3    1

Q1. “Is an automatic or manual transmission better for MPG”

We shall first perform exploratory data analysis and plot the data using boxplot.

ggplot(data=mtcars, aes(am, mpg))+geom_boxplot() +labs(x="Transmission", y="Miles/(US) gallon", title="Plot of Miles Per Gallon vs Car Transmission (Automatic / Manual)")

We can see that cars using “Manual”" Transmission is more fuel-efficient than those using “Automatic” Transmission as evident from the higher Median value. In addition, the minimum value is also higher than the maximum value of “Automatic” cars, and this further supports the fuel-advantage of manual cars. Based on this, we conclude that manual transmission vehicle is better in terms of MPG.

Q2. “Quantify the MPG difference between automatic and manual transmissions”

We shall now use 2 methods to quantify the mpg difference between automatic and manual tarnsmissions: (i) Firstly, we shall do a simple t-test to ascertain if the mean mpg of manual cars are significantly difference from that of automatic cars. (ii) Second method involves comparing (using ANOVA) a multiple regression model over a simple regression model to ascertain if there’s a significant difference in means of mpg in manual cars and automatic cars holding other variables constant.

(1) Using simple t-test on the data

We can further do a t-test to further evaluate this. Specifically, we can split the data into “Automatic” and “Manual”, and see if there’s a difference in the miles-per-gallon using a t-test.

auto_cars <- mtcars[mtcars$am=="Automatic",]
manual_cars <- mtcars[mtcars$am=="Manual",]
t.test(auto_cars$mpg, manual_cars$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  auto_cars$mpg and manual_cars$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

Given that p-value is small (0.001374), we support the alternative hypothesis that the true difference in means between manual and automatic cars is not equal to 0. Given that the mean Miles per Gallon (mpg) of manual cars is higher (24.39231) than automatic cars (17.14737), we therefore conclude that manual transmission better for MPG.

(2) Using ANOVA to test a simple linear model and a multi-variate regression model

(2.1) Model 1: Simple linear model (regressing mpg vs am)

simple_model <- lm(mpg~am-1, data=mtcars)#we remove the intercept so that the coefficients can be directly compared with one another.
summary(simple_model)
## 
## Call:
## lm(formula = mpg ~ am - 1, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## amAutomatic   17.147      1.125   15.25 1.13e-15 ***
## amManual      24.392      1.360   17.94  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.9487, Adjusted R-squared:  0.9452 
## F-statistic: 277.2 on 2 and 30 DF,  p-value: < 2.2e-16

From the simple linear regression model, we note that the mpg for manual cars are higher (24.392) than automatic cars (17.147).

(2.2) Model 2: A multi-variate regression model

we shall use regress mpg against all available variables in the dataset, and then utilise a stepwise method to test and remove variables that are not significant.

multiple_regression <- step(lm(mpg~., data=mtcars)) 

Based on the results of the stepwise regression, the model with lowest AIC is selected, which is: mpg~wt + qsec + am. This means that the key variables that affect Miles per Gallon (mpg) of cars are: weight (wt), acceleration (qsec), and transmission type (am). We shall use this as our complex model.

complex_model <- lm(mpg~wt + qsec + am-1, data=mtcars)
complex_model
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am - 1, data = mtcars)
## 
## Coefficients:
##          wt         qsec  amAutomatic     amManual  
##      -3.917        1.226        9.618       12.554

(2.3) ANOVA Test between simple and complex model

anova(simple_model, complex_model)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am - 1
## Model 2: mpg ~ wt + qsec + am - 1
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1     30 720.90                                 
## 2     28 169.29  2    551.61 45.618 1.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The high p-value of the complex model means that we can reject the null hypothesis in favour of the alternate hypothesis that besides transmission, weight and acceleration of cars also yield significant impact on mpg. But holding weight and acceleration constant manual cars offer better fuel efficiency (about 2.94 mpg difference) than automatic cars.

(2.4) Plot wt and qsec differentiated by am (Automatic and Manual)

Let’s take a look at the wt and qsec plots and see if there’s a difference between automatic and manual cars.

# Change box plot colors by groups
require(gridExtra)# gridExtra package need to be installed
## Loading required package: gridExtra
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(ggplot2)
plot1 <- ggplot(data=mtcars, aes(x=am, y=wt, fill=am)) +
  geom_boxplot()
plot2 <- ggplot(data=mtcars, aes(x=am, y=qsec, fill=am)) +
  geom_boxplot()
grid.arrange(plot1, plot2, ncol=2)

From the above plots, we note that automatic cars are heavier (median weight above 3,500 lbs) as compared to manual cars (median weight below 2,500 lbs). It is evident that the heavier weight of automatic cars is significant, and that probably leads to lower Miles per gallon (mpg).

We note that there are overlapping acceleration between maunal and automatic cars. However, on balance, the acceleration for automatic cars is moderately faster (median qsec is slightly below 18 quarter mile per sec) versus that of manual cars (median qsec is about 17 quarter mile per sec). To provide higher acceleration, it is possible that more/particular mechanisms in automatic vehicles are probably needed, which in turn may reduce its fuel efficiency. However, deeper analysis would be required to validate this hypothesis.

(2.5) Plot residuals

We shall use a Studentized Residual Plot in the olsrr package to detect potential outliers in our complex model. This method considers a point as an outlier if it has an aboslute value higher than 3.

library(olsrr)
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers
ols_plot_resid_stud(complex_model)

we note that there are 2 leverage points (17, 18) in the data, but this is still within 3 standard deviation. This means that no further adjustments are required for our complex model, and we can take our analysis from this model as statistically valid.

Conclusion

In conclusion, manual cars appear to have better fuel efficiency (about 2.94 mpg higher) than automatic cars.