This report analyses mtcars data set to answer the following questions
1.Is an automatic or manual transmission better for MPG?
2.Quantify the MPG difference between automatic and manual tranmissions
To answer the above questions, I looked at the data set and found few columns which are interesting to us, which are MPG, AM (Automatic/Manual transmission). I even tried creating models with all the columns, but they only created more standard errors.
#load the mtcars data set into our variable
motorTrend <- mtcars
#Look at all the variables and their summary
#summary(motorTrend)
#Did an str on motorTrend and found that we need to convert some variables into factors
str(motorTrend)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#Converted the am variable to a factor variable
motorTrend$amf <- factor(motorTrend$am,labels=c("Automatic","Manual"))
motorTrend$amf
## [1] Manual Manual Manual Automatic Automatic Automatic Automatic
## [8] Automatic Automatic Automatic Automatic Automatic Automatic Automatic
## [15] Automatic Automatic Automatic Manual Manual Manual Automatic
## [22] Automatic Automatic Automatic Automatic Manual Manual Manual
## [29] Manual Manual Manual Manual
## Levels: Automatic Manual
We probably can predict the MPG using amf varibale, so I did a box plot to find out if there is an relationship. There is clearly a differentiation betwen manual transmission vehicles and automatic transmission vehicle’s MPG.
Please see this plot in the appendix section
I started modeling with a single variable(AMF) in fit1 and then expanded the model fit2 to include all the variables and compared those models use anova function to see if there is a better fitting model.
# Fit1 takes only AMF variable as the input
fit1 <- lm(motorTrend$mpg ~ motorTrend$amf)
#Fit2 takes AMF, cyl , disp, hp, wt as regressers for our model
fit2 <- lm(mpg ~ amf + factor(cyl)+ disp + hp + wt, data=motorTrend)
#comparing coeffecients of fit1 and fit2
summary(fit1)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.134e-15
## motorTrend$amfManual 7.245 1.764 4.106 2.850e-04
summary(fit2)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.864276 2.69542 12.5637 2.668e-12
## amfManual 1.806099 1.42108 1.2709 2.155e-01
## factor(cyl)6 -3.136067 1.46909 -2.1347 4.277e-02
## factor(cyl)8 -2.717781 2.89815 -0.9378 3.573e-01
## disp 0.004088 0.01277 0.3202 7.515e-01
## hp -0.032480 0.01398 -2.3228 2.862e-02
## wt -2.738695 1.17598 -2.3289 2.825e-02
#Compared the models using anova functin and it turns out that model fit2 is much better than fit1
#anova(fit1,fit2)
The model fit2 shows that MPG increased when we change to manual mode and also it decreases with more cylinders (cycl6,cyl8), more horse power and more weight.
So, based on the results from 2nd model, we can answer the question as manual transmission cars are more fuel efficient compared to automatic cars. This can be quantified by our coefficient for manual transmission car in 2nd model fit which increases by 1.8 when compared to automatic transmission cars. Residuals plots are shown at the bottom of the plot for both fit1 and fit2.
Uncertainity in conclusion: When I looked at the coefficients of model fit2, I found that R-squared value is 0.84 indicating 84% of variance is included in the model. If we need to include more variance, then we need more samples which are currently relatively few.
plot(motorTrend$amf,motorTrend$mpg,col="green",ylab="MPG")
par(mfrow=c(2,2))
plot(fit1)
par(mfrow=c(2,2))
plot(fit2)