Introduction

In this analysis we are attempting to find whether a manual or automatic transmission is “better”. we will analysis 1- “Is an automatic or manual transmission better for MPG” 2-“Quantify the MPG difference between automatic and manual transmissions”

Loading Data and Library

library(datasets)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.3
data(mtcars)

Exploring our Dataset

The dataset ‘mtcars’ can be found in th R data library. It has 32 observations of 11 variables. Before we begin our anlaysis we will quickly gain some insight into miles per gallon(mpg) by running some basic analysis.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.42   19.20   20.09   22.80   33.90

You can also embed plots, for example:

P1<-ggplot(data=mtcars, aes(mtcars$mpg))+geom_histogram(color='red',fill='green')+xlab("MPG")+
  ggtitle("MPG Frequency")
P2<- ggplot(data = mtcars,aes(am,mpg))+geom_boxplot()+
  facet_grid(.~ am )+labs(title="MPG by Transmission Type")

grid.arrange(P1,P2,ncol=2)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Model Selection

The first model we will run is a linear regression model against mpg for each variable.

fitall <- summary(lm(mpg ~ factor(am)*.,data=mtcars))
fitall
## 
## Call:
## lm(formula = mpg ~ factor(am) * ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0346 -0.7600  0.1089  0.5484  2.6959 
## 
## Coefficients: (2 not defined because of singularities)
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)         8.64345   22.37276   0.386   0.7060  
## factor(am)1      -146.55089   66.32350  -2.210   0.0473 *
## cyl                -0.53391    1.17256  -0.455   0.6570  
## disp               -0.02025    0.01813  -1.117   0.2859  
## hp                  0.06223    0.04791   1.299   0.2184  
## drat                0.59159    3.13258   0.189   0.8534  
## wt                  1.95413    2.32068   0.842   0.4162  
## qsec               -0.88432    0.78877  -1.121   0.2842  
## vs                  0.73891    2.61246   0.283   0.7821  
## am                       NA         NA      NA       NA  
## gear                8.65416    4.05167   2.136   0.0540 .
## carb               -4.81050    1.97648  -2.434   0.0315 *
## factor(am)1:cyl    -0.74737    4.26142  -0.175   0.8637  
## factor(am)1:disp    0.20017    0.15960   1.254   0.2337  
## factor(am)1:hp     -0.22268    0.13808  -1.613   0.1328  
## factor(am)1:drat   -5.54142    5.84742  -0.948   0.3620  
## factor(am)1:wt    -12.49602    5.07276  -2.463   0.0299 *
## factor(am)1:qsec    8.97928    3.21468   2.793   0.0162 *
## factor(am)1:vs      0.20419    5.28538   0.039   0.9698  
## factor(am)1:am           NA         NA      NA       NA  
## factor(am)1:gear    3.67430    7.25129   0.507   0.6215  
## factor(am)1:carb    9.49905    4.16833   2.279   0.0418 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.877 on 12 degrees of freedom
## Multiple R-squared:  0.9625, Adjusted R-squared:  0.9031 
## F-statistic:  16.2 on 19 and 12 DF,  p-value: 8.251e-06

The resulst of this fit is very good as the R-Squere is .9625. It means we can explain 96% of variance. We will check at least one more model as there are many variable which are showing strong corelation with MPG . Specifically the number of carborators (carb), weight in lb/1000 (wt) and 1/4 mile time (qsec).

The next regration mpdel we will use is multiple regration model.

fit4 <- summary(lm(mpg ~ am+carb+wt+qsec,data=mtcars))
fit4
## 
## Call:
## lm(formula = mpg ~ am + carb + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1184 -1.5414 -0.1392  1.2917  4.3604 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.8972     7.4725   1.726 0.095784 .  
## am            3.5114     1.4875   2.361 0.025721 *  
## carb         -0.4886     0.4212  -1.160 0.256212    
## wt           -3.4343     0.8200  -4.188 0.000269 ***
## qsec          1.0191     0.3378   3.017 0.005507 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.444 on 27 degrees of freedom
## Multiple R-squared:  0.8568, Adjusted R-squared:  0.8356 
## F-statistic: 40.39 on 4 and 27 DF,  p-value: 5.064e-11

In this model the R-Squere is about 86%. It means we lose explanation of 10% variance. so will go back to the fist model.Howewer this model give a strong co-relation between MPG and wt and qsec. To answer the question about transmission models we want to run this model over the each transmission for the variables.

fit<- lm(mpg ~ factor(am):wt+factor(am):qsec,data=mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ factor(am):wt + factor(am):qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9361 -1.4017 -0.1551  1.2695  3.8862 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.9692     5.7756   2.419  0.02259 *  
## factor(am)0:wt    -3.1759     0.6362  -4.992 3.11e-05 ***
## factor(am)1:wt    -6.0992     0.9685  -6.297 9.70e-07 ***
## factor(am)0:qsec   0.8338     0.2602   3.205  0.00346 ** 
## factor(am)1:qsec   1.4464     0.2692   5.373 1.12e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.097 on 27 degrees of freedom
## Multiple R-squared:  0.8946, Adjusted R-squared:  0.879 
## F-statistic: 57.28 on 4 and 27 DF,  p-value: 8.424e-13

This gives us a 90% R Sqaured without all the noise of the other variables not showing coefficient significance. This is the model we will use to explain our results and plot residuals.

par(mfrow=c(2,2))
plot(fit)