In this project, I will be answering two questions:
“Is an automatic or manual transmission better for MPG”?
“Quantify the MPG difference between automatic and manual transmissions”?
data<-mtcars
data$am<-factor(data$am, labels=c("Auto","Manual"))
require(ggplot2)
## Loading required package: ggplot2
ggplot(data, aes(x=am, y=mpg, colour=am))+geom_boxplot()+xlab("Auto or Manual")+ylab("MPG")+ggtitle("Auto MPG versus Manual MPG")
From the above, we see that manual transmissions have a higher mean and median of MPG than automatic transmissions mean and median MPG. We would expect that for those who purchases a manual automobile would be better off for MPG than automatic automobile.
reg<-lm(mpg~am,data=data)
summary(reg)
##
## Call:
## lm(formula = mpg ~ am, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
From the above analysis, we find that on average automobiles that are manual to have 7.245 more MPG than automobiles that are automatic. The intercept term tells you that on average automobiles that are automatic would have 17.147 MPG on average. While automobiles that are manual have on average would have 24.392 MPG (we sum the intercept term and amManual). We see that the p-value for automobiles that are manual is statistically significant(less than alpha 0.05). However, our R-squared is small 35.98%, which tells us that this is not a good model to predict MPG. There are other variables that need to be included to predict MPG.
Please refer to the appendix for the graphs and charts in how I go making my model selection. We need to transform some data because they are categorical variables. Some variables are confounding variables and need to be removed. Below is my code in deciding which variables to be removed because they are to highly correlated with other variables in the data. According to my fitted model below, it is good because anova reveals that our second model is better than our first model. The second model is telling us that manual transmissions are getting 4.15 MPG on average more than auto transmission cars while holding horse power and specific cylinder(4,6,8) constants. The variable is manual transmission vs auto transmission is statistically significant. Both Residuals vs Fitted graph and Normal Q-Q looks good for the second model as well. The Residuals vs Fitted graph looks pretty random and there is not clear pattern.
data$carb<-as.factor(data$carb)
data$gear<-as.factor(data$gear)
data$vs<-as.factor(data$vs)
data$cyl<-as.factor(data$cyl)
lmreg<-lm(mpg~.-disp-wt-carb-vs-qsec-gear-drat, data=data)
summary(lmreg)
##
## Call:
## lm(formula = mpg ~ . - disp - wt - carb - vs - qsec - gear -
## drat, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.231 -1.535 -0.141 1.408 5.322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.29590 1.42394 19.169 < 2e-16 ***
## cyl6 -3.92458 1.53751 -2.553 0.01666 *
## cyl8 -3.53341 2.50279 -1.412 0.16943
## hp -0.04424 0.01458 -3.035 0.00527 **
## amManual 4.15786 1.25655 3.309 0.00266 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.703 on 27 degrees of freedom
## Multiple R-squared: 0.8249, Adjusted R-squared: 0.7989
## F-statistic: 31.79 on 4 and 27 DF, p-value: 7.401e-10
anova(reg, lmreg)
My report shows that manual transmission cars would give more MPG than auto transmission cars on average when holding horse power and specific cylinder 4,6,8 constant. My adjusted R-square is about 80 percent, which reveals that the model I build would explain about 80 percent for the change in variation of MPG. This is a strong model.
str(data)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "Auto","Manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
cov(mtcars)
## mpg cyl disp hp drat wt
## mpg 36.324103 -9.1723790 -633.09721 -320.732056 2.19506351 -5.1166847
## cyl -9.172379 3.1895161 199.66028 101.931452 -0.66836694 1.3673710
## disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040
## hp -320.732056 101.9314516 6721.15867 4700.866935 -16.45110887 44.1926613
## drat 2.195064 -0.6683669 -47.06402 -16.451109 0.28588135 -0.3727207
## wt -5.116685 1.3673710 107.68420 44.192661 -0.37272073 0.9573790
## qsec 4.509149 -1.8868548 -96.05168 -86.770081 0.08714073 -0.3054816
## vs 2.017137 -0.7298387 -44.37762 -24.987903 0.11864919 -0.2736613
## am 1.803931 -0.4657258 -36.56401 -8.320565 0.19015121 -0.3381048
## gear 2.135685 -0.6491935 -50.80262 -6.358871 0.27598790 -0.4210806
## carb -5.363105 1.5201613 79.06875 83.036290 -0.07840726 0.6757903
## qsec vs am gear carb
## mpg 4.50914919 2.01713710 1.80393145 2.1356855 -5.36310484
## cyl -1.88685484 -0.72983871 -0.46572581 -0.6491935 1.52016129
## disp -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
## hp -86.77008065 -24.98790323 -8.32056452 -6.3588710 83.03629032
## drat 0.08714073 0.11864919 0.19015121 0.2759879 -0.07840726
## wt -0.30548161 -0.27366129 -0.33810484 -0.4210806 0.67579032
## qsec 3.19316613 0.67056452 -0.20495968 -0.2804032 -1.89411290
## vs 0.67056452 0.25403226 0.04233871 0.0766129 -0.46370968
## am -0.20495968 0.04233871 0.24899194 0.2923387 0.04637097
## gear -0.28040323 0.07661290 0.29233871 0.5443548 0.32661290
## carb -1.89411290 -0.46370968 0.04637097 0.3266129 2.60887097
pairs(data)
plot(lmreg)