Manual vs. automatic transmission: Fuel Efficiency


Executive Summary

  • Subject Matter: This report provides an analysis and evaluation of Manual Vs. automatic transmission regrads to fuel efficiency recorded in MPG (Miles Per Gallon).

  • Methods of analysis: Difference in the fuel efficiency is measured and validated with the help of multiple variable linear regression model . Several Model fits are tested and later chosen the best model for the analysis. mtcars data set used from r datasets, it has information of 32 different cars with 11 variables/parameters like mpg, weight etc. All graphs related to exploratory and final data anlaysis can be found in the appendix.

  • Findings & Conclusion: The report finds that there’s a significant differnce in the fuel economy of manual Vs. automatic transmission cars. Manual Transmission seems to have better fuel efficiency as compared to Automatic Transmission.

  • Limitations: Several variables in the given data set are highly correlated, so it seems that data is not properly randomised. This also put a constraint in inclusion of multiple variables in the regression model due to multicollinearity.

Loading R Packages, mtcars data set and transforming variables

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
library(car)
## Warning: package 'car' was built under R version 3.2.4
data("mtcars")

mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c('Automatic','Manual'))

Exploratory Data Analysis

First, we plot correlation matrix to examine the relationship of each variable among each other, Figure 1.1 is given in the Appendix for same. From the plot we can deduce that variables like cyl, disp, hp, drat, wt, vs and am seem to have some strong correlation with mpg. We will examine these variables further in model selection, that means which variables to include and/or to exclude from the model.

Notice the figure 1.2 in Appendix, this’s boxplot with am variable plotted in x-axis and mpg on y axis. This plot clearly depicts an increase in the mpg when the transmission is Manual.Now in next section we will see if this holds true when we keep the other important variables constant. We can do this by running multiple regression model

Regression Analysis

In this section, we build multiple linear regression models based on different variables, we use nested model technique and compare models with the help of anova. We also check impact on variance inflation factor due to inclusion of new variables. Let’s do this now

fit1<-lm(mpg ~ factor(am),data = mtcars)
fit2<-lm(mpg ~ hp + factor(am),data = mtcars)
fit3<-lm(mpg ~ hp + wt +factor(am),data = mtcars)
anova(fit1,fit2,fit3)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ hp + factor(am)
## Model 3: mpg ~ hp + wt + factor(am)
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     29 245.44  1    475.46 73.841 2.445e-09 ***
## 3     28 180.29  1     65.15 10.118  0.003574 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From our anova output, This gives us the S statistic and the P value associated with each of them, then here it shows that yes, the inclusion of wt Information appears to be necessary when we’re just looking at hp and am by itself. Now we will see the impact on VIF when we shifted from model 2 to model 3.

vif(fit2)
##         hp factor(am) 
##   1.062867   1.062867
vif(fit3)
##         hp         wt factor(am) 
##   2.088124   3.774838   2.271082

By looking and compairing the VIF’s of model 2 and model 3, there’s a huge impact on variance when we included wt variable, so based on this we consider model 2 to as our final model.

Residuals and Diagnostics

In this section, we will plot residuals against the fitted value to validate the model assumption of equal variance, will also check normality of residuals. Finally, will look out for influencer and leverage point impacting the regression model.

par(mfrow = c(2,2))

plot(fit2)

lev <- hatvalues(fit2)
tail(sort(lev),6)
##     Duster 360     Camaro Z28    Honda Civic      Merc 240D Ford Pantera L 
##      0.1050017      0.1050017      0.1177812      0.1230556      0.2141234 
##  Maserati Bora 
##      0.3929383
inf <- dfbetas(fit2)
tail(sort(inf),6)
## [1] 0.2923872 0.3556437 0.4016702 0.4018393 0.5642167 0.9219857

Looking at the above cars, we notice that our analysis was correct, as the same cars are mentioned in the residual plots.

Interpretation Of Cofficients

mtcars$hp1<-mtcars$hp-mean(mtcars$hp)
finalmodel<-lm(mpg ~ hp1 + factor(am),data = mtcars)
summary(finalmodel)
## 
## Call:
## lm(formula = mpg ~ hp1 + factor(am), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3843 -2.2642  0.1366  1.6968  5.8657 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      17.946809   0.675884  26.553  < 2e-16 ***
## hp1              -0.058888   0.007857  -7.495 2.92e-08 ***
## factor(am)Manual  5.277085   1.079541   4.888 3.46e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.909 on 29 degrees of freedom
## Multiple R-squared:  0.782,  Adjusted R-squared:  0.767 
## F-statistic: 52.02 on 2 and 29 DF,  p-value: 2.55e-10

Firstly, just to make our intercept more interpretable we have subtracted each value of hp from its mean. Also looking at the t-value for our third coefficient, we can reject the null hypothesis i.e. b3=0, that means there’s a significant difference between the fuel efficiency of car with Manual Vs. Automatic transmission after removing the effect of horsepower from both regressor and outcome.

  • Intercept b0: For automatic transmission the average fuel efficiency is 17.95 mpg for an average hp car

  • Intercept b1: With unit increase in horsepower, mpg deacrease by the fraction of 0.06.

  • Intercept b2: For manual Transmission the average fuel efficiency is 23.23 mpg.

Inference

Now we will perform the t-test without keeping other variables constant. It also suggest that there’s a significant difference in fuel mileage of manual vs. automatic car.

t.test(mpg ~ am, data = mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Conclusion

From Our regression and inference analysis, there’s a significant difference between the fuel efficiency of car with Manual Vs. Automatic transmission after removing the effect of horsepower from both regressor and outcome.Also refer conclusive figure 1.3 in appendix

Appendix

Figure 1.1

pairs(mtcars, panel = panel.smooth, main = "mtcars data")

Figure 1.2

g<-ggplot(data = mtcars,aes(x = am, y = mpg,fill = am))
g<-g+geom_boxplot(color = "black",alpha = .5)+theme_bw()
g<-g+xlab("Transmission Type") + ylab("MPG (Miles per Gallon)")
g<-g+ggtitle("Fuel Efficiency: Automatic Vs. Manaul Transmission")
g

Figure 1.3

g<-ggplot(data = mtcars,aes(x = hp, y = mpg,color = am))
g<-g+geom_point(size=4,alpha = .7)
g<-g+geom_abline(intercept = coef(finalmodel)[1],slope = coef(finalmodel)[2],lwd = 1)
g<-g+geom_abline(intercept = coef(finalmodel)[1] + coef(finalmodel)[3],slope = coef(finalmodel)[2],lwd = 1)
g<-g+geom_abline(intercept = mean(mtcars$mpg[mtcars$am=="Manual"]),slope = 0,lwd = 1)
g<-g+geom_abline(intercept = mean(mtcars$mpg[mtcars$am=="Automatic"]),slope = 0,lwd = 1)
g<-g+xlab("Horse Power") + ylab("MPG (Miles per Gallon)")
g<-g+ggtitle("Fuel Efficiency: Automatic Vs. Manaul Transmission")
g<-g+theme_bw()
g