Executive Summary

This report uses data analysis and regression modeling with the mtcars dataset to explore the relationship of a car’s transmission and Miles per Gallon (MPG), specifically whether automatic or manual transmission is better for MPG and the size of this difference. By following a model fitting procedure, we determine that, holding all other variables constant, cars with manual transmissions have an MPG on average 2.9 greater than cars with automatic transmissions. We also determined, however, that the weight and quarter-mile time of the car likely have a more significant effect on MPG and should be considered before transmission type.

Exploratory Data Analysis

First, we will perform some exploratory analysis to examine the relationship between transmission type and miles per gallon.

At first glance, it looks like Manual transmission typically has better MPG than Automatic. With this in mind, we will begin modeling to quantify the difference between transmission types and explore the effect of other variables on MPG.

Modeling

To begin, we will use the code below to fit a linear model on MPG and all other variables in the dataset, then use the step function to iteratively remove irrelevant variables from the model.

fitall <- lm(mpg ~ ., mtcars)
stepFit <- step(fitall, direction = "both")

The step function uses a stepwise algorithm to choose a model based on Akaike’s Information Criterion (AIC). AIC is an estimator of out-of-sample prediction error and can be used to compare a set of models’ relative quality for a given set of data. The step function tests a model for AIC, removes an unnecessary variable, then repeats these two steps repeatedly until it finds the model with the lowest AIC. In this case, the model with the lowest AIC value included the Weight (wt), Quarter-Mile Time (qsec), and Transmission (am) variables. See the appendix for the anova process used by the step function.

To make sure that these variables are not highly correlated to one another, we will calculate their variance inflation factors (VIF). As shown below, all variables have low VIFs, so we proceed with this model.

##       wt     qsec       am 
## 2.482952 1.364339 2.541437

A summary of the model is shown below. Because Transmission is a factor variable (0 = automatic, 1 = manual), the intercept represents the estimate for automatic transmission MPG, while the am estimate represents the increase in MPG for cars with manual transmission. That being said, holding all other variables constant, manual cars are can be expected to have approximately 2.9 more MPG than automatic cars. The t-statistic of 0.046 for am shows that there is a significant difference between the two transmission types at an alpha level of 0.05. The coeffecients also tell us that with each 1000 lb increase in weight, MPG will decrease by approximately 3.9, and that MPG will increase by about 1.2 for each additional second of duration for quarter-mile time.

## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Uncertainty and Residuals

The summary above also shows us that of the 3 variables included in the model, transmission has the largest standard error and is the least statistically significant. This tells us that closer consideration should be paid to the weight and quarter-mile time of a car than its transmission when trying to predict its MPG. From the residuals plots (see apprendix), specifically the Residuals vs Fitted plot, we can see that the trend line falls mostly around zero, and that a few outliers (Chrysler Imperial, Fiat 128, and Toyota Corrola), are skewing the line somewhat.

Appendix

Step function’s anova process of removing variables from the model

##     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1        NA         NA        21   147.4944 70.89774
## 2  - cyl  1 0.07987121        22   147.5743 68.91507
## 3   - vs  1 0.26852280        23   147.8428 66.97324
## 4 - carb  1 0.68546077        24   148.5283 65.12126
## 5 - gear  1 1.56497053        25   150.0933 63.45667
## 6 - drat  1 3.34455117        26   153.4378 62.16190
## 7 - disp  1 6.62865369        27   160.0665 61.51530
## 8   - hp  1 9.21946935        28   169.2859 61.30730

Residual plots