Executive Summary:

This report analyses mtcars data set to answer the following questions

1.Is an automatic or manual transmission better for MPG?

2.Quantify the MPG difference between automatic and manual tranmissions

To answer the above questions, I looked at the data set and found few columns which are interesting to us, which are MPG, AM (Automatic/Manual transmission). I even tried creating models with all the columns, but they only created more standard errors.

Preprocessing:

#load the mtcars data set into our variable
motorTrend <- mtcars

#Look at all the variables and their summary
#summary(motorTrend)

#Did an str on motorTrend and found that we need to convert some variables into factors
str(motorTrend)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Converted the am variable to a factor variable
motorTrend$amf <- factor(motorTrend$am,labels=c("Automatic","Manual"))
motorTrend$amf
##  [1] Manual    Manual    Manual    Automatic Automatic Automatic Automatic
##  [8] Automatic Automatic Automatic Automatic Automatic Automatic Automatic
## [15] Automatic Automatic Automatic Manual    Manual    Manual    Automatic
## [22] Automatic Automatic Automatic Automatic Manual    Manual    Manual   
## [29] Manual    Manual    Manual    Manual   
## Levels: Automatic Manual

Exploratory Analysis:

We probably can predict the MPG using amf varibale, so I did a box plot to find out if there is an relationship. There is clearly a differentiation betwen manual transmission vehicles and automatic transmission vehicle’s MPG.

Please see this plot in the appendix section

Prediction:

I started modeling with a single variable(AMF) in fit1 and then expanded the model fit2 to include all the variables and compared those models use anova function to see if there is a better fitting model.

# Fit1 takes only AMF variable as the input
fit1 <- lm(motorTrend$mpg ~ motorTrend$amf)

#Fit2 takes AMF, cyl , disp, hp, wt as regressers for our model
fit2 <- lm(mpg ~ amf + factor(cyl)+ disp + hp + wt, data=motorTrend)

#comparing coeffecients of fit1 and fit2
summary(fit1)$coef
##                      Estimate Std. Error t value  Pr(>|t|)
## (Intercept)            17.147      1.125  15.247 1.134e-15
## motorTrend$amfManual    7.245      1.764   4.106 2.850e-04
summary(fit2)$coef
##               Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  33.864276    2.69542 12.5637 2.668e-12
## amfManual     1.806099    1.42108  1.2709 2.155e-01
## factor(cyl)6 -3.136067    1.46909 -2.1347 4.277e-02
## factor(cyl)8 -2.717781    2.89815 -0.9378 3.573e-01
## disp          0.004088    0.01277  0.3202 7.515e-01
## hp           -0.032480    0.01398 -2.3228 2.862e-02
## wt           -2.738695    1.17598 -2.3289 2.825e-02
#Compared the models using anova functin and it turns out that model fit2 is much better than fit1
#anova(fit1,fit2) 

The model fit2 shows that MPG increased when we change to manual mode and also it decreases with more cylinders (cycl6,cyl8), more horse power and more weight.

Conclusion:

So, based on the results from 2nd model, we can answer the question as manual transmission cars are more fuel efficient compared to automatic cars. This can be quantified by our coefficient for manual transmission car in 2nd model fit which increases by 1.8 when compared to automatic transmission cars. Residuals plots are shown at the bottom of the plot for both fit1 and fit2.

Uncertainity in conclusion: When I looked at the coefficients of model fit2, I found that R-squared value is 0.84 indicating 84% of variance is included in the model. If we need to include more variance, then we need more samples which are currently relatively few.

Supporting figures:

Exploratory Analysis (Type of Car vs Mileage)

plot(motorTrend$amf,motorTrend$mpg,col="green",ylab="MPG")

plot of chunk unnamed-chunk-3

Residual plots for model fit1

par(mfrow=c(2,2))
plot(fit1)

plot of chunk unnamed-chunk-4

Residual Plots for model fit2

par(mfrow=c(2,2))
plot(fit2)

plot of chunk unnamed-chunk-5