In this analysis we are attempting to find whether a manual or automatic transmission is “better”. we will analysis 1- “Is an automatic or manual transmission better for MPG” 2-“Quantify the MPG difference between automatic and manual transmissions”
library(datasets)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.3
data(mtcars)
The dataset ‘mtcars’ can be found in th R data library. It has 32 observations of 11 variables. Before we begin our anlaysis we will quickly gain some insight into miles per gallon(mpg) by running some basic analysis.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.42 19.20 20.09 22.80 33.90
You can also embed plots, for example:
P1<-ggplot(data=mtcars, aes(mtcars$mpg))+geom_histogram(color='red',fill='green')+xlab("MPG")+
ggtitle("MPG Frequency")
P2<- ggplot(data = mtcars,aes(am,mpg))+geom_boxplot()+
facet_grid(.~ am )+labs(title="MPG by Transmission Type")
grid.arrange(P1,P2,ncol=2)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
The first model we will run is a linear regression model against mpg for each variable.
fitall <- summary(lm(mpg ~ factor(am)*.,data=mtcars))
fitall
##
## Call:
## lm(formula = mpg ~ factor(am) * ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0346 -0.7600 0.1089 0.5484 2.6959
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.64345 22.37276 0.386 0.7060
## factor(am)1 -146.55089 66.32350 -2.210 0.0473 *
## cyl -0.53391 1.17256 -0.455 0.6570
## disp -0.02025 0.01813 -1.117 0.2859
## hp 0.06223 0.04791 1.299 0.2184
## drat 0.59159 3.13258 0.189 0.8534
## wt 1.95413 2.32068 0.842 0.4162
## qsec -0.88432 0.78877 -1.121 0.2842
## vs 0.73891 2.61246 0.283 0.7821
## am NA NA NA NA
## gear 8.65416 4.05167 2.136 0.0540 .
## carb -4.81050 1.97648 -2.434 0.0315 *
## factor(am)1:cyl -0.74737 4.26142 -0.175 0.8637
## factor(am)1:disp 0.20017 0.15960 1.254 0.2337
## factor(am)1:hp -0.22268 0.13808 -1.613 0.1328
## factor(am)1:drat -5.54142 5.84742 -0.948 0.3620
## factor(am)1:wt -12.49602 5.07276 -2.463 0.0299 *
## factor(am)1:qsec 8.97928 3.21468 2.793 0.0162 *
## factor(am)1:vs 0.20419 5.28538 0.039 0.9698
## factor(am)1:am NA NA NA NA
## factor(am)1:gear 3.67430 7.25129 0.507 0.6215
## factor(am)1:carb 9.49905 4.16833 2.279 0.0418 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.877 on 12 degrees of freedom
## Multiple R-squared: 0.9625, Adjusted R-squared: 0.9031
## F-statistic: 16.2 on 19 and 12 DF, p-value: 8.251e-06
The resulst of this fit is very good as the R-Squere is .9625. It means we can explain 96% of variance. We will check at least one more model as there are many variable which are showing strong corelation with MPG . Specifically the number of carborators (carb), weight in lb/1000 (wt) and 1/4 mile time (qsec).
The next regration mpdel we will use is multiple regration model.
fit4 <- summary(lm(mpg ~ am+carb+wt+qsec,data=mtcars))
fit4
##
## Call:
## lm(formula = mpg ~ am + carb + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1184 -1.5414 -0.1392 1.2917 4.3604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.8972 7.4725 1.726 0.095784 .
## am 3.5114 1.4875 2.361 0.025721 *
## carb -0.4886 0.4212 -1.160 0.256212
## wt -3.4343 0.8200 -4.188 0.000269 ***
## qsec 1.0191 0.3378 3.017 0.005507 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.444 on 27 degrees of freedom
## Multiple R-squared: 0.8568, Adjusted R-squared: 0.8356
## F-statistic: 40.39 on 4 and 27 DF, p-value: 5.064e-11
In this model the R-Squere is about 86%. It means we lose explanation of 10% variance. so will go back to the fist model.Howewer this model give a strong co-relation between MPG and wt and qsec. To answer the question about transmission models we want to run this model over the each transmission for the variables.
fit<- lm(mpg ~ factor(am):wt+factor(am):qsec,data=mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ factor(am):wt + factor(am):qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9361 -1.4017 -0.1551 1.2695 3.8862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.9692 5.7756 2.419 0.02259 *
## factor(am)0:wt -3.1759 0.6362 -4.992 3.11e-05 ***
## factor(am)1:wt -6.0992 0.9685 -6.297 9.70e-07 ***
## factor(am)0:qsec 0.8338 0.2602 3.205 0.00346 **
## factor(am)1:qsec 1.4464 0.2692 5.373 1.12e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.097 on 27 degrees of freedom
## Multiple R-squared: 0.8946, Adjusted R-squared: 0.879
## F-statistic: 57.28 on 4 and 27 DF, p-value: 8.424e-13
This gives us a 90% R Sqaured without all the noise of the other variables not showing coefficient significance. This is the model we will use to explain our results and plot residuals.
par(mfrow=c(2,2))
plot(fit)