Executive Summary

In this brief analysis we attempt to perform some basic regression methods against a known dataset, ‘mtcars’ that is provided in the R-base package. We will attempt to determine whether or not there is a significant difference in mileage per gallon (mpg) between vehicles with automatic (0) or manual transmissions (1) in the ‘am’ regressor.
library('dplyr')
library('ggplot2')
data('mtcars')
time <- format(Sys.time(), "%a %b %d %X %Y")

#generate some summary statistics for the 'mtcars' dataset
summarise(
    avg = mean(mpg)
  ,stdev.mpg = sd(mpg)
  ,var.mpg = var(mpg)
  ,min.mpg = min(mpg)
  ,max.mpg = max(mpg)
  ,.data = mtcars
) -> dat.summary

#subset out the automatic transmission observations
cars.auto <- mtcars[mtcars$am==0, ]
cars.auto %>% #generate summary statistics by automatic transmission
  summarise(
    avg = mean(mpg)
    ,stdev.mpg = sd(mpg)
    ,var.mpg = var(mpg)
    ,min.mpg = min(mpg)
    ,max.mpg = max(mpg)
  ) -> auto.sum

#subset out the manual transmission observations
cars.man <- mtcars[mtcars$am==1, ]
cars.man %>% #generate summary statistics by manual transmission
  summarise(
    avg = mean(mpg)
    ,stdev.mpg = sd(mpg)
    ,var.mpg = var(mpg)
    ,min.mpg = min(mpg)
    ,max.mpg = max(mpg)
  ) -> man.sum

Exploratory summary statistics of the overall mpg – average mpg = 20 with a standard deviation of 6 mpg.

dat.summary
##        avg stdev.mpg var.mpg min.mpg max.mpg
## 1 20.09062  6.026948 36.3241    10.4    33.9

Exploratory summary statistics of the overall mpg by automatic transmission vehicle – average mpg = 17 with a standard deviation of 3.8 mpg.

auto.sum
##        avg stdev.mpg var.mpg min.mpg max.mpg
## 1 17.14737  3.833966 14.6993    10.4    24.4

Exploratory summary statistics of the overall mpg by manual transmission vehicles – average mpg = 24 mpg with a standard deviation of 6 mpg.

man.sum
##        avg stdev.mpg  var.mpg min.mpg max.mpg
## 1 24.39231  6.166504 38.02577      15    33.9
It’s pretty clear that simply looking at automatic vs. manual tranmission using simple averages there is a clear winner. The next step of the exploratory analysis is a simple visualization.
##Explorative inference and regression approach
y <- mtcars$mpg
x <- mtcars$am
fit.1 <- lm(y ~ x)
e <- resid(fit.1)
yhat <- predict(fit.1)

#####A correlation comparison of the mpg to other predictors gives us an idea of what other factors could have an influence on mpg. Accordingly, we observe that mpg is positively correlated with 'drat', 'qsec', 'vs', 'am' (transmission), and 'gear' and negatively correlated with all the remaining.
cor(mtcars)[1,] #Subset out only the mpg correlation predictors
##        mpg        cyl       disp         hp       drat         wt 
##  1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719 -0.8676594 
##       qsec         vs         am       gear       carb 
##  0.4186840  0.6640389  0.5998324  0.4802848 -0.5509251

Is there a difference between automatic and manual transmission and their affect on mpg?

We fit a simple linear model looking at mpg ~ am and identify not much of note. Automatics still have less mpg overall at ~17mpg versus ~17mpg + 7mpg (better). The p-values are remarkably small, therefore, we fail to reject the null hypothesis that there is no difference between manual and automatic transmission. Having said that, it’s likely not as simple as just the transmission affecting mpg because we know cars are extremely complex machines of engineering and design.
fit <- lm(mpg ~ am, data = mtcars); summary(fit)$coeff;summary(fit)$r.squared
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## am           7.244939   1.764422  4.106127 2.850207e-04
## [1] 0.3597989

Building perhaps a more robust model

We fit a general linear model of the Gaussian family this time utilizing positive predictor of ‘wt’ and ‘gear’ along with the transmission. From this estimate we get perhaps a ‘cleaner’ picture of the predictor variable’s role in mpg. Namely, the affect of mpg is much smaller @ 0.59 when we add in weight ad gear. Interestingly, a new pattern starts to emerge that shows weight has a declining impact on mpg (which we kind of know intuitively). The heavier a car is worse the fuel economy.
fit2 <- glm(mpg ~ am + wt + gear, data = mtcars);summary(fit2)$coeff
##               Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) 39.2114278  5.2848951  7.4195280 4.427762e-08
## am           0.5937500  2.1008563  0.2826229 7.795447e-01
## wt          -5.3797696  0.8017495 -6.7100382 2.770328e-07
## gear        -0.5570034  1.2619415 -0.4413861 6.623234e-01

Conclusion

There does not appear to be enough data or a rigorous enough approach to truly satify the answer about whether or not automatic vs manual transmissions are better. Topically, it appears that manual transmissions are better simply because on average (from our limited dataset) the mileage was better.
However, there are numerous other factors that were not quantified that make it difficult to assume one is truly better than the other. The classes of the cars are all difference. Did the automatic models all have extremely obese driving them? What were the weather conditions? What type of fuel did they use or did they all use the same fuel?

We observed that as we added more factors into the model that transmission started having less of an affect on the overall results of mpg. So, we tentatively conclude that there is probably some small affect between the two but we didn’t test enough of the other prediction variables to really settle the dispute. ``` #####Looking solely at the factors of automatic versus manual transmission there appears to be an affect on mpg as the average for manual transmissions is almost 10 mpg better on average (solid back lines).

Appendix of Figures

So, there clearly is a difference, but, is it statistically significant?

#plot of mpg using 'am' as a factor variable
plot(mpg ~ as.factor(am), data = mtcars, xlab = "Transmission (0 = Auto/1=Man)", ylab = "mpg", main = "MPG for Auto vs Manual Transmissions")

The plot of residual variance shows significant variance between models in transmission affect across models on mpg.
#Residual Plot on the horizontal access
res.graph <- plot(mtcars$mpg, e, xlab = "mpg", ylab = "Residual Variance"
     , main = 'Residual Variance of mpg by Transmission Factor'
     , bg = 'lightblue'
     , col = 'black'
     , cex = 2
     , pch = 21
     , frame = FALSE)
abline(h = 0, lwd = 2)
for(i in 1:length(mtcars$mpg))
  lines(c(mtcars$mpg[i], mtcars$mpg[i])
        , c(e[i], 0), col = 'red', lwd = 2)

This HTML markdown file was generated in knitr at: Mon Feb 01 8:37:29 AM 2016.