knitr::opts_chunk$set(echo = TRUE)

Synopsis/Executive Summary

Traditional belief has stated that cars with manual transmissions would be more fuel-efficient than cars with automatic transmissions. But as technology becomes more advanced, is manual transmission still better in fuel economy than automatic transmission in terms of mpg? In this report we will explore the 2012 fuel economy guide produced by the US Environmental Protection Agency and US Department of Energy to answer this question. The dataset is downloaded via URL “http://www.fueleconomy.gov/feg/download.shtml” and saved as csv format. The transmission column specifies different types of motor vehicle transmission and gears, for clarification I add a column that categorized them into either automatic and manual transmission. Likewise, I catagorize the displacement into families of 1,2,3-liter and revmove electric cars to clarify comparison.

part1: dataset for discussion

Here is code chunk to load data and plot mpg of automatic and manual transmission.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setwd("~/Desktop/DOE_fueleconomy") # set working directory to data folder 
dataset <- read.csv("fuel econ_2012.csv") 
dataSelect <- dataset[1:432,] # lots of blanks in cvs
dataSelect <- select(dataSelect,Model,Displ,Cyl,Trans,City.MPG,Hwy.MPG,Cmb.MPG)
dataSelect <- filter(dataSelect, Trans != "Other-6")
dataSelect<- mutate(dataSelect, Trans1=ifelse(Trans=="Man-5"|Trans=="Man-6","Manual","Auto"))
dataSelect<- mutate(dataSelect, Displ1=ifelse(Displ %in% c(1.8,1.6,1.5,1.4,1.3),1,ifelse(Displ %in% c(2.5,2.4,2.2,2),2,ifelse(Displ %in% c(3.5,3),3,NA))))
dataSelect <- na.omit(dataSelect) #remove electric cars from the dataset
dataSelect$Trans1 <- factor(dataSelect$Trans1)
dataSelect$Displ1 <- factor(dataSelect$Displ1)
dataSelect$Cmb.MPG <- as.numeric(as.character(dataSelect$Cmb.MPG))
## Warning: NAs introduced by coercion
boxplot(Cmb.MPG~Trans1,data=dataSelect, sub="Fig 1 2012 Car Milage Data", xlab="Transmission", ylab="MPG", col=c("gold","coral"))

As seen in Fig 1, on average cars with automatic transmission are more fuel efficient than the ones with manual transmission.

temp_manual <- subset(dataSelect,Trans1=="Manual")
temp_auto <- subset(dataSelect,Trans1=="Auto")
quant_manual <- quantile(temp_manual$Cmb.MPG, c(0.25,0.5,0.75,1),na.rm=TRUE)
quant_auto <- quantile(temp_auto$Cmb.MPG, c(0.25,0.5,0.75,1),na.rm=TRUE)

temp <- rbind(quant_auto,quant_manual)
cnames <- c("25%","50%","75%","100%")
colnames(temp)<-cnames
rnames <- c("Auto","Manual")
rownames(temp) <- rnames
names(dimnames(temp)) <- list("","     Table 1")
temp
             Table 1
         25% 50% 75% 100%
  Auto    26  29  31   50
  Manual  28  29  30   34

Table 1 tabulates mpg quantiles of Automatic and Manual Transmission, supportig Fig 1 exploratory analysis.

part2: linear models

Here is code chunk to fit combined city and highway mpg with transmission as predictor.

fit1<-lm(Cmb.MPG~Trans1-1,data=dataSelect,na.action=na.omit)
summary(fit1)$coef
##              Estimate Std. Error   t value      Pr(>|t|)
## Trans1Auto   30.07451  0.2667151 112.75892 5.818914e-302
## Trans1Manual 28.94326  0.3586808  80.69364 4.058616e-247

The automatic transmission has a higher combined mpg than manual transmission (30.07 vs. 28.94) holding engine displacement constant.

fit2 <- lm(Cmb.MPG~Trans1+Displ1-1,data=dataSelect,na.action=na.omit)
summary(fit2)$coef
              Estimate Std. Error   t value      Pr(>|t|)
Trans1Auto   32.185381  0.3355839 95.908597 2.868027e-274
Trans1Manual 30.448760  0.3681576 82.705788 3.044849e-250
Displ12      -3.537920  0.3991551 -8.863523  2.751333e-17
Displ13      -4.549018  1.2166292 -3.739034  2.122521e-04
fit3 <- lm(Cmb.MPG~Trans1+Displ1+Cyl-1,data=dataSelect,na.action=na.omit)
summary(fit3)$coef
              Estimate Std. Error    t value      Pr(>|t|)
Trans1Auto   32.145382  0.3337477 96.3164136 5.521271e-274
Trans1Manual 30.501105  0.3663825 83.2493546 1.952668e-250
Displ12      -3.268468  0.4086610 -7.9979917  1.454956e-14
Displ13      -0.632104  4.0433340 -0.1563324  8.758519e-01
Cyl5         -2.354776  0.9129857 -2.5792034  1.026827e-02
Cyl6         -3.876914  3.8640037 -1.0033413  3.163183e-01

The automatic transmission still has a higher mpg than manual transmission (32.19 vs. 30.45) when engine displacement is taken in account. So omitting displacement did not change the analysis. Although it should be noted that approximately one mpg drop with every additional liter of air volume (between 2 to 3 litre). Similarly, adding cylinder predictor, model fit3, does not change mpg estimates. We use ANOVA to quantify the significance of adding displacement regressor. The null hypothesis = added regressors are not significant.

anova(fit1,fit2)
Analysis of Variance Table

Model 1: Cmb.MPG ~ Trans1 - 1
Model 2: Cmb.MPG ~ Trans1 + Displ1 - 1
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1    394 7147.1                                  
2    392 5897.0  2    1250.2 41.552 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The three asterisks, ***, at the lower right of the printed table indicate that the null hypothesis is rejected at the 0.001 level, so the additional regressors is significant, this is supported by the small p-value of 2.2e-16, that a false rejection of the null hypothesis is extremely unlikely. Lastly, we run diagnostic plots to check for heterosedasticity, normality and influential observations.

layout(matrix(c(1,2,3,4),2,2))
plot(fit2)

The Residuals vs Fitted plot shows an equally spreaded residuals around a horizontal line without distinct patterns, the model data are simulated in a way that meets the regression assumptions well. Normall Q-Q plot shows that the Toyota Prius car data curve off in the extremities (data 376 thru 383). The Scale-Location plot shows a relatively horizontal line with equally randomly spread points. The Residuals vs Leverage plot shows all cases are well inside of Cook’s distance lines, so even though the Toyota Prius have extreme values, they might not be inflential to determine a regression line. We are confident of the linear relationship between the transmission and mpg and adding engine displacement is better for learning mpg than just transmission predictor.