Executive Summary

Motor Trend currently performed an analysis on the ‘mtcars’ dataset to determine the relationship between transmission type (manual & automatic) and fuel economy (mpg).

We determined that there is a statistically significant difference, with manual transmisisons performing more efficiently than automatic. After controlling for confounding variables (eg, horsepower and weight), we find that, on average, manual transmisisons provide a ~2.08 mpg improvement.

Exploratory Analysis

To begin, we first want to get a sense of the shape and quality of the data. This can be done using ‘head’, ‘str’, and ‘summary’ functions, as well as a preliminary violin plot. The data seems clean and the transmission category (am) is split into 0s & 1s. 0 represents “Automatic” and 1 represents “Manual.” The underlying code for this output is in the appendix.

## Warning: package 'ggplot2' was built under R version 3.2.3

##          am      mpg
## 1 Automatic 17.14737
## 2    Manual 24.39231

Looking at the chart, it is clear that manual transmissions seem to perform more efficiently. On average, their mpg is ~7.2 mpg greater than automatic transmissions. Let’s perform a t.test and a simple regression to see if the difference is statistically significant and how predictive the variables are.

The results (in the appendix) are pretty good. We use a two-sided t-test since the null is that the values should be equal to one another. The p-value of ~0.001 indicates that there is a statistically significant difference between the two. The R2 from the regression would indicate that transmission, by itself, would explain about ~95% of the difference. Let’s see if there might be confounding of the output with other variables, though.

We will run a ‘cor’ test to see how well the variables correlate to mpg. This can be used to predict which variables we will include in the analysis.

##        mpg        cyl       disp         hp       drat         wt 
##  1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719 -0.8676594 
##       qsec         vs         am       gear       carb 
##  0.4186840  0.6640389  0.5998324  0.4802848 -0.5509251

## 
## Call:
## lm(formula = mpg ~ hp + wt + factor(am) - 1, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## factor(am)0 34.002875   2.642659  12.867 2.82e-13 ***
## factor(am)1 36.086585   1.736338  20.783  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.9872, Adjusted R-squared:  0.9853 
## F-statistic: 538.2 on 4 and 28 DF,  p-value: < 2.2e-16

We included hp & wt. For each regression we used “-1” to force the intercept to go through the origin. This improvs the R2 scores - it leaves the coefficients as absolute interpretations intead of relative. Therefore, the influence of manual transmissions (over automatic) is the difference between the two values -> 36.08-34.00 = 2.08.

All values in this model are statistically significant and account for ~98% of the variance. Looking at the residual plots (which can be found in the appendix), there does not appear to be any apparent patterns or outliers that create an undue influence on the prediction.

APPENDIX

data("mtcars")

mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic", "Manual")

violin = ggplot(data = mtcars, aes(y = mpg, x = am, fill = am))
violin = violin + geom_violin(alpha = .5)
violin = violin + xlab("Transmission Type") + ylab("MPG")
violin = violin + scale_fill_discrete(name = "Transmission Type", labels=c("Automatic", "Manual"))
violin

aggregate(mpg ~ am, data=mtcars, mean)

mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic", "Manual")
mtcars_auto <- mtcars[mtcars$am == "Automatic",]
mtcars_man <- mtcars[mtcars$am == "Manual",]
t.test(mtcars_auto$mpg, mtcars_man$mpg)

## 
##  Welch Two Sample t-test
## 
## data:  mtcars_auto$mpg and mtcars_man$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

reg1 <- lm(mpg ~ factor(am) -1, data=mtcars)
summary(reg1)

## 
## Call:
## lm(formula = mpg ~ factor(am) - 1, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## factor(am)Automatic   17.147      1.125   15.25 1.13e-15 ***
## factor(am)Manual      24.392      1.360   17.94  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.9487, Adjusted R-squared:  0.9452 
## F-statistic: 277.2 on 2 and 30 DF,  p-value: < 2.2e-16

anova(reg1, reg2)

## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am) - 1
## Model 2: mpg ~ hp + wt + factor(am) - 1
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     28 180.29  2    540.61 41.979 3.745e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

par(mfrow = c(2,2))
plot(reg2)

Regression Models Final Project

Brad Allen

January 20, 2016

Executive Summary

Exploratory Analysis

APPENDIX