This report uses data analysis and regression modeling with the mtcars dataset to explore the relationship of a car’s transmission and Miles per Gallon (MPG), specifically whether automatic or manual transmission is better for MPG and the size of this difference. By following a model fitting procedure, we determine that, holding all other variables constant, cars with manual transmissions have an MPG on average 2.9 greater than cars with automatic transmissions. We also determined, however, that the weight and quarter-mile time of the car likely have a more significant effect on MPG and should be considered before transmission type.
First, we will perform some exploratory analysis to examine the relationship between transmission type and miles per gallon.
At first glance, it looks like Manual transmission typically has better MPG than Automatic. With this in mind, we will begin modeling to quantify the difference between transmission types and explore the effect of other variables on MPG.
To begin, we will use the code below to fit a linear model on MPG and all other variables in the dataset, then use the step function to iteratively remove irrelevant variables from the model.
fitall <- lm(mpg ~ ., mtcars)
stepFit <- step(fitall, direction = "both")
The step function uses a stepwise algorithm to choose a model based on Akaike’s Information Criterion (AIC). AIC is an estimator of out-of-sample prediction error and can be used to compare a set of models’ relative quality for a given set of data. The step function tests a model for AIC, removes an unnecessary variable, then repeats these two steps repeatedly until it finds the model with the lowest AIC. In this case, the model with the lowest AIC value included the Weight (wt), Quarter-Mile Time (qsec), and Transmission (am) variables. See the appendix for the anova process used by the step function.
To make sure that these variables are not highly correlated to one another, we will calculate their variance inflation factors (VIF). As shown below, all variables have low VIFs, so we proceed with this model.
## wt qsec am
## 2.482952 1.364339 2.541437
A summary of the model is shown below. Because Transmission is a factor variable (0 = automatic, 1 = manual), the intercept represents the estimate for automatic transmission MPG, while the am estimate represents the increase in MPG for cars with manual transmission. That being said, holding all other variables constant, manual cars are can be expected to have approximately 2.9 more MPG than automatic cars. The t-statistic of 0.046 for am shows that there is a significant difference between the two transmission types at an alpha level of 0.05. The coeffecients also tell us that with each 1000 lb increase in weight, MPG will decrease by approximately 3.9, and that MPG will increase by about 1.2 for each additional second of duration for quarter-mile time.
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The summary above also shows us that of the 3 variables included in the model, transmission has the largest standard error and is the least statistically significant. This tells us that closer consideration should be paid to the weight and quarter-mile time of a car than its transmission when trying to predict its MPG. From the residuals plots (see apprendix), specifically the Residuals vs Fitted plot, we can see that the trend line falls mostly around zero, and that a few outliers (Chrysler Imperial, Fiat 128, and Toyota Corrola), are skewing the line somewhat.
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 21 147.4944 70.89774
## 2 - cyl 1 0.07987121 22 147.5743 68.91507
## 3 - vs 1 0.26852280 23 147.8428 66.97324
## 4 - carb 1 0.68546077 24 148.5283 65.12126
## 5 - gear 1 1.56497053 25 150.0933 63.45667
## 6 - drat 1 3.34455117 26 153.4378 62.16190
## 7 - disp 1 6.62865369 27 160.0665 61.51530
## 8 - hp 1 9.21946935 28 169.2859 61.30730