Executive Summary

At Motor Trend magazine we want to answer the age old debate, which transmission is more fuel efficient a manual or automatic transmission? We also want to know the estimated MPG difference between automatic and manual transmissions.

Based on our analysis of ourcar data, we have determinied that there is a significant difference in average MPG between automatic and manual transmissions.

We have concluded that manual transmissions have better MPG than automatic transmissions by 2.94 MPG.

The Data

The data used for analysis contained 32 observations, with 11 variables.

Examining the Data

The data did not contain any missing values. The data structure is satifactory. The only change we made was to factor the am variable into A and M levels (automatic and manual).

Exploring the Data

What is the average MPG for automatic and manual transmissions?

# Calculate the mean MPG for auto and manual transmissions
meanMPG <- aggregate(mpg~am, cars, mean)
meanMPG
##   am      mpg
## 1  A 17.14737
## 2  M 24.39231

On average manual transmissions have better MPG than automatic.

Data Visualization

library(ggplot2)
ggplot(cars, aes(mpg,am)) +
   geom_count(col = 'blue', show.legend=T) +
  labs(x="Miles per Gallon", y="Tranmission", title= "MPG Auto vs Manual Transmission")

# boxplot
ggplot(cars, aes(am,mpg)) +
        geom_boxplot() +
        labs(x="Transmission", y="Miles per Gallon", title= "MPG Auto vs Manual Transmission")

The visualizations show that manual transmissions have better MPG than automatic. Is the difference statistically signifiant?

aCars <- cars[cars$am=="A",]
tCars <- cars[cars$am=="M",]
test <- t.test(aCars$mpg,tCars$mpg)
test$p.value
## [1] 0.001373638

Since the P.value is less than .05, there is a significant difference between average MPG of manual and automatic transmissions.

Linear modeling

Model 1: The intuitive approach

Our first model is based on our general knowledge about cars to figure out which variables best account for MPG variance. We test our model using cyl, wt, am, and hp.

# intuition model
fit1 <- lm(mpg~am + cyl + wt + hp, cars)
summary(fit1)
## 
## Call:
## lm(formula = mpg ~ am + cyl + wt + hp, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4765 -1.8471 -0.5544  1.2758  5.6608 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.14654    3.10478  11.642 4.94e-12 ***
## amM          1.47805    1.44115   1.026   0.3142    
## cyl         -0.74516    0.58279  -1.279   0.2119    
## wt          -2.60648    0.91984  -2.834   0.0086 ** 
## hp          -0.02495    0.01365  -1.828   0.0786 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared:  0.849,  Adjusted R-squared:  0.8267 
## F-statistic: 37.96 on 4 and 27 DF,  p-value: 1.025e-10

Our first model is good, but the F-statistic: 37.96 could be better, and p-value of hp is not statistically significant. Multiple R-squared: 0.849 accounts for 85% of the MPG variance which is not bad.

Model 2: Let the computer select the best variables.

fitStep <- step(lm(data = cars, mpg ~ .), trace = 0, steps = 10000)
summary(fitStep)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amM           2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Our second model is also good, the F-statistic: 52.75 is better. The step model removed hp, cyl, and added qsec, making all varaibles significant. The Multiple R-squared: 0.849 is the same for both models.

Which model is better?

# which model is better
anova(fit1,fitStep)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am + cyl + wt + hp
## Model 2: mpg ~ wt + qsec + am
##   Res.Df    RSS Df Sum of Sq F Pr(>F)
## 1     27 170.00                      
## 2     28 169.29 -1   0.71184

The anova comparision between our 2 models shows no significant difference, we accept the null hypothesis because the p-value is greater than .05.

We decided to go with model 2 because of the higher F-statistic, and fewer variables to complicate the model.

Residuals

The residuals vs. fitted supports independence. The Normal Q-Q shows that the distribution of the residuals is normal. The Scale-Location shows the residuals are randomly spread along the range of predictors. The Residuals vs Leverage shows no ouliers to be concerned about.

# plot residuals
par(mfrow = c(2,2))
plot(fitStep)

Inference

Our model estimates with at 95% confidence level that a manual transmission will perform between .046 MPG to 5.83 MPG with an estimate of 2.94 MPG better than automatic transmissions.

# Calculate estimated coefficients confidence intervals
Cof <- summary(fitStep)$coefficients
Cof
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## amM          2.935837  1.4109045  2.080819 4.671551e-02
#[ Calculate estimated coefficients confidence intervals
cint = data.frame(From = NULL, To = NULL, Estimate = NULL)
for (i in 1:4)
        cint <- rbind(cint, c(Cof[i,1] + c(-1, 1) * qt(.975, df = fitStep$df) * Cof[i, 2], 
                            Cof[i,1]))
names(cint) <- c('From', 'To', 'Estimate')
cint[4,]
##         From       To Estimate
## 4 0.04573031 5.825944 2.935837

Conclusion

Holding weight and acceleration constant manual transmission cars have 2.94 MPG better fuel efficiency. However because the dataset is has only 32 observations there is a possibility of overfitting the model. Also With a larger data set it would be easier to automatic and manual transmissions of each car type.