Executive Summary

This analysis seeks to answer the following questions about the mtcars data set:
1. Is an automatic or manual transmission better for MPG?
2. Can the MPG difference between automatic and manual transmission be quantified?

It can be concluded that manual transmissions are in fact better, with an MPG 1.8 greater than automatic transmissions.
My analysis is described below.

Data Processing

#Read data
data(mtcars)

#Examine data structure
str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Coerce appropriate variables into factors
mtcars$cyl<-factor(mtcars$cyl)
mtcars$vs<-factor(mtcars$vs)
mtcars$gear<-factor(mtcars$gear)
mtcars$carb<-factor(mtcars$carb)
mtcars$am<-factor(mtcars$am,labels = c("Auto","Manual"))

#Confirm variable conversion
str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
 $ am  : Factor w/ 2 levels "Auto","Manual": 2 2 2 1 1 1 1 1 1 1 ...
 $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
 $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

Exploratory Analysis

library(ggplot2)

ggplot(mtcars,aes(x = am, y = mpg)) +
        geom_boxplot(fill = c("darkgreen","gold"))+
        labs(x = "Transmission Type",y="Miles Per Gallon", title = "Miles Per Gallon by Transmission Type" )

Conclusion 1: As shown by the plot above, Manual transmissions are better for MPG as these cars have a higher median MPG relative to automative transmissions.

Regression Analysis

My exploratory analysis indicates there is a difference in MPG between automatic and manual transmissions. Thus, the next step is to fit a regression model to determine their relationship.

aggregate(mpg~am, data = mtcars, mean)
      am      mpg
1   Auto 17.14737
2 Manual 24.39231

Based on the table above, I hypothesize that the MPG for automatic cars is 7.2 MPG lower than manual cars. A t-test was performed to determine the statistical significance of this difference.

auto<-mtcars[mtcars$am=="Auto",]
man<-mtcars[mtcars$am=="Manual",]
t.test(auto$mpg, man$mpg)

    Welch Two Sample t-test

data:  auto$mpg and man$mpg
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean of x mean of y 
 17.14737  24.39231 

A p value of 0.00137 is less than the .05 cutoff for statistical significance. Thus, we can reject the null hypothesis and conclude that there is a statistically significant difference between the average MPG in automatic cars versus manual cars. A linear model was fitted to quantify this difference.

fit<-lm(mpg~am, mtcars)
summary(fit)

Call:
lm(formula = mpg ~ am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.3923 -3.0923 -0.2974  3.2439  9.5077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   17.147      1.125  15.247 1.13e-15 ***
amManual       7.245      1.764   4.106 0.000285 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared:  0.3598,    Adjusted R-squared:  0.3385 
F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The T-test indicates the average MPG for automatic transmission is 17.1 MPG, with manual transmission being 7.2 MPG higher. However, the R-Squared value indicates this model only explains 36% of the variance. A better option would be to build a multivariate linear regression model to quantify the difference.

Thus, a pairs plot was generated to determine the additional variables to include in the new model.

pairs(mpg ~., data =mtcars)

Based on the plot, the following variables have a very strong correlation with mpg: cyl, disp, hp, and wt. A new model was fitted using these additional variables and results were compared to the initial model using the Anova command.

fit2<-lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
summary(fit2)

Call:
lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9374 -1.3347 -0.3903  1.1910  5.0757 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.864276   2.695416  12.564 2.67e-12 ***
amManual     1.806099   1.421079   1.271   0.2155    
cyl6        -3.136067   1.469090  -2.135   0.0428 *  
cyl8        -2.717781   2.898149  -0.938   0.3573    
disp         0.004088   0.012767   0.320   0.7515    
hp          -0.032480   0.013983  -2.323   0.0286 *  
wt          -2.738695   1.175978  -2.329   0.0282 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.453 on 25 degrees of freedom
Multiple R-squared:  0.8664,    Adjusted R-squared:  0.8344 
F-statistic: 27.03 on 6 and 25 DF,  p-value: 8.861e-10
anova(fit,fit2)
Analysis of Variance Table

Model 1: mpg ~ am
Model 2: mpg ~ am + cyl + disp + hp + wt
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     30 720.90                                  
2     25 150.41  5    570.49 18.965 8.637e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The second model’s p value of 8.637e-08 suggests adding multivariate is significantly better than using the simple model that was fitted initially. Also, the second model has a higher R-Squared value of 0.8664. Thus, 86.6% of the variance between MPG in automatic and manual transmission is explained by the second model, a significant improvement from the first. The second model’s superior performance was further confirmed by a plot of its residuals.

par(mfrow=c(2,2))
plot(fit2)

Conclusion 2: The difference in MPG between automatic and manual transmissions can be quantified as 1.81MPG