In this project, I will be answering two questions:

“Is an automatic or manual transmission better for MPG”?

“Quantify the MPG difference between automatic and manual transmissions”?

data<-mtcars
data$am<-factor(data$am, labels=c("Auto","Manual"))
require(ggplot2)
## Loading required package: ggplot2
ggplot(data, aes(x=am, y=mpg, colour=am))+geom_boxplot()+xlab("Auto or Manual")+ylab("MPG")+ggtitle("Auto MPG versus Manual MPG")

Analysis

From the above, we see that manual transmissions have a higher mean and median of MPG than automatic transmissions mean and median MPG. We would expect that for those who purchases a manual automobile would be better off for MPG than automatic automobile.

Build Logistics Regression Model

reg<-lm(mpg~am,data=data)
summary(reg)
## 
## Call:
## lm(formula = mpg ~ am, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

 

From the above analysis, we find that on average automobiles that are manual to have 7.245 more MPG than automobiles that are automatic. The intercept term tells you that on average automobiles that are automatic would have 17.147 MPG on average. While automobiles that are manual have on average would have 24.392 MPG (we sum the intercept term and amManual). We see that the p-value for automobiles that are manual is statistically significant(less than alpha 0.05). However, our R-squared is small 35.98%, which tells us that this is not a good model to predict MPG. There are other variables that need to be included to predict MPG.

Build Another model to explain MPG with Auto and Manual Transmissions?

Please refer to the appendix for the graphs and charts in how I go making my model selection. We need to transform some data because they are categorical variables. Some variables are confounding variables and need to be removed. Below is my code in deciding which variables to be removed because they are to highly correlated with other variables in the data. According to my fitted model below, it is good because anova reveals that our second model is better than our first model. The second model is telling us that manual transmissions are getting 4.15 MPG on average more than auto transmission cars while holding horse power and specific cylinder(4,6,8) constants. The variable is manual transmission vs auto transmission is statistically significant. Both Residuals vs Fitted graph and Normal Q-Q looks good for the second model as well. The Residuals vs Fitted graph looks pretty random and there is not clear pattern.

data$carb<-as.factor(data$carb)
data$gear<-as.factor(data$gear)
data$vs<-as.factor(data$vs)
data$cyl<-as.factor(data$cyl)
lmreg<-lm(mpg~.-disp-wt-carb-vs-qsec-gear-drat, data=data)
summary(lmreg)
## 
## Call:
## lm(formula = mpg ~ . - disp - wt - carb - vs - qsec - gear - 
##     drat, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.231 -1.535 -0.141  1.408  5.322 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.29590    1.42394  19.169  < 2e-16 ***
## cyl6        -3.92458    1.53751  -2.553  0.01666 *  
## cyl8        -3.53341    2.50279  -1.412  0.16943    
## hp          -0.04424    0.01458  -3.035  0.00527 ** 
## amManual     4.15786    1.25655   3.309  0.00266 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.703 on 27 degrees of freedom
## Multiple R-squared:  0.8249, Adjusted R-squared:  0.7989 
## F-statistic: 31.79 on 4 and 27 DF,  p-value: 7.401e-10
anova(reg, lmreg)

 

Executive Summary

My report shows that manual transmission cars would give more MPG than auto transmission cars on average when holding horse power and specific cylinder 4,6,8 constant. My adjusted R-square is about 80 percent, which reveals that the model I build would explain about 80 percent for the change in variation of MPG. This is a strong model.

APPENDIX

str(data)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "Auto","Manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
cov(mtcars)
##              mpg         cyl        disp          hp         drat          wt
## mpg    36.324103  -9.1723790  -633.09721 -320.732056   2.19506351  -5.1166847
## cyl    -9.172379   3.1895161   199.66028  101.931452  -0.66836694   1.3673710
## disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040
## hp   -320.732056 101.9314516  6721.15867 4700.866935 -16.45110887  44.1926613
## drat    2.195064  -0.6683669   -47.06402  -16.451109   0.28588135  -0.3727207
## wt     -5.116685   1.3673710   107.68420   44.192661  -0.37272073   0.9573790
## qsec    4.509149  -1.8868548   -96.05168  -86.770081   0.08714073  -0.3054816
## vs      2.017137  -0.7298387   -44.37762  -24.987903   0.11864919  -0.2736613
## am      1.803931  -0.4657258   -36.56401   -8.320565   0.19015121  -0.3381048
## gear    2.135685  -0.6491935   -50.80262   -6.358871   0.27598790  -0.4210806
## carb   -5.363105   1.5201613    79.06875   83.036290  -0.07840726   0.6757903
##              qsec           vs           am        gear        carb
## mpg    4.50914919   2.01713710   1.80393145   2.1356855 -5.36310484
## cyl   -1.88685484  -0.72983871  -0.46572581  -0.6491935  1.52016129
## disp -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
## hp   -86.77008065 -24.98790323  -8.32056452  -6.3588710 83.03629032
## drat   0.08714073   0.11864919   0.19015121   0.2759879 -0.07840726
## wt    -0.30548161  -0.27366129  -0.33810484  -0.4210806  0.67579032
## qsec   3.19316613   0.67056452  -0.20495968  -0.2804032 -1.89411290
## vs     0.67056452   0.25403226   0.04233871   0.0766129 -0.46370968
## am    -0.20495968   0.04233871   0.24899194   0.2923387  0.04637097
## gear  -0.28040323   0.07661290   0.29233871   0.5443548  0.32661290
## carb  -1.89411290  -0.46370968   0.04637097   0.3266129  2.60887097
pairs(data)

plot(lmreg)