Executive Summary

In 1974, Motor Trend US magazine published data detailing 11 design and performance attributes of 32 1973-1974 car models. This study used that data two answer two questions. First, does an automatic or a manual transmission get better gas mileage as measured by miles per gallon (MPG)? Second, what is the quantifiable difference in MPG when cars with manual and automatic transmissions are compared?

From the results of the study, it can be concluded that a statistically significant relationship between MPG and transmission type exists. When these variables were considered in isolation, manual transmissions averaged 7.24 MPG more than automatic transmissions. However, when other predictor variables were introduced into the model, transmission type lost significance and the number of cylinders, horsepower and vehicle weight where found to have a greater impact on fuel economy. In the multivariate model, manual transmissions were responsible for increasing gas mileage by only 1.81 MPG. Of all predictor variables considered, weight had the greatest impact. Fuel economy was shown to drop 2.5 MPG for every 1000/lb. increase in vehicle weight.

Exploratory Data Analysis and Results

Mean Fuel Economy by Transmission Type
A summary of the data set and transformations made prior to analysis are summarized in Appendix [1]. Figure 1 shows mean miles per gallon for both manual and automatic transmissions. Manual transmissions achieve 7.24 MPG more on average than automatic transmissions.
plot of chunk unnamed-chunk-1

There appeared to be a difference in the means but a t-Test was conducted to determine if the mean MPG for automatic and manual transmissions were statistically different from one another. Prior to conducting the t-test, the MPG data was tested for normality using a Shapiro-Wilk test. The test for normality yielded a p-value greater than 0.05 (p= 0.1229) causing the rejection of the alternative hypothesis that the data is not normally distributed.

It can reliably be infered that a statistically significant relationship exists (t= -3.7671, p= 0.001) between transmission type and MPG in a bivariate setting. On average, cars with manual transmissions achieved an MPG of 24.39 while their automatic counterparts achieved an average MPG of 17.15, a difference of 7.24 more miles per gallon. The p-value of the t-test demonstrated that the means were statistically different from one another and allowed the rejection of the null hypothesis that the means were equal. The Shapiro-Wilk test results for normality and the t-Test results appear in Appendix [2].

Regression Analysis
A bivariate regression of MPG on transmission type (Model 1) was then used to quantify the difference in fuel economy between manual and automatic transmissions. The results of Model 1 replicated the t-test: manual transmissions achieve 7.24 more miles per gallon than automatic transmissions (p= 0.0003). Model 1 explained 34% of variation in the response variable MPG according to its Adjusted R-squared of 0.338.

Backward stepwise selection and Akaike Information Criteria (AIC) were then used to optimize a multivariate model that addressed the impact of confouding variables. The model mpg ~ cylinders + horsepower + weight + transmission had the lowest AIC score (61.65) and was therefore chosen as the optimal regression model (Model 2). Model 2 produced an Adjusted R-squared of 0.84, explaining 84% of the variation in MPG. An analysis of variance (ANOVA) test was then performed to test if Model 1 was a better fit to the data than Model 2. The ANOVA test returned F= 24.5, p= 1.7e-08 which supported the alternative hypothesis that the multivariate model was superior.

According to Model 2, transmission type was not a significant predictor of MPG (p= 0.21) and manual transmissions only achieved 1.81 MPG more on average than automatic transmissions. Instead, vehicle weight had the strongest impact on MPG (p= 0.0091). Fuel economy was shown to drop 2.5 MPG for every 1000/lb. increase in vehicle weight. Details on model fitting and the ANOVA test are presented in Appendix [3].

Analysis of Residuals
Next, residual plots for Model 2 were examined for abnormalities. The spread of the residuals around the regression line appeared to be normally distributed and evenly scattered. See Appendix [4], Figure 2 for the Residual Plots.

Appendix

[1] The Data Set and Transformations
mtcars contains observations of the following 11 variables for 32 automobile models:

Columns were renamed to more interpretable names and the variables cylinders, vs, gears, carburators and transmission were converted to factors prior to analysis.

[2] Shapiro-Wilk Test for Normality and t-Test For Difference in Means

shapiro.test(mtcars$mpg)
## 
##  Shapiro-Wilk normality test
## 
## data:  mtcars$mpg
## W = 0.9476, p-value = 0.1229
with(mtcars, t.test(mpg ~ transmission))
## 
##  Welch Two Sample t-test
## 
## data:  mpg by transmission
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.28  -3.21
## sample estimates:
## mean in group automatic    mean in group manual 
##                   17.15                   24.39

[3] Model Fitting
Perform a simple bivariate regression of MPG on transmission type:

null <- lm(mpg~transmission, data = mtcars)
summary(null)
## 
## Call:
## lm(formula = mpg ~ transmission, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.392 -3.092 -0.297  3.244  9.508 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           17.15       1.12   15.25  1.1e-15 ***
## transmissionmanual     7.24       1.76    4.11  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared:  0.36,   Adjusted R-squared:  0.338 
## F-statistic: 16.9 on 1 and 30 DF,  p-value: 0.000285

Perform a multivariate regression using backward stepwise selection and choose the best model according to the lowest AIC score:

reduced <- step(lm(mpg ~ ., data = mtcars), direction = "backward", trace=0,k=2)
summary(reduced)
## 
## Call:
## lm(formula = mpg ~ cylinders + horsepower + weight + transmission, 
##     data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.939 -1.256 -0.401  1.125  5.051 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         33.7083     2.6049   12.94  7.7e-13 ***
## cylinders6          -3.0313     1.4073   -2.15   0.0407 *  
## cylinders8          -2.1637     2.2843   -0.95   0.3523    
## horsepower          -0.0321     0.0137   -2.35   0.0269 *  
## weight              -2.4968     0.8856   -2.82   0.0091 ** 
## transmissionmanual   1.8092     1.3963    1.30   0.2065    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.866,  Adjusted R-squared:  0.84 
## F-statistic: 33.6 on 5 and 26 DF,  p-value: 1.51e-10

Compare the two models using ANOVA:

with(mtcars, anova(lm(mpg ~ transmission), lm(mpg ~ cylinders + horsepower + weight + transmission)))
## Analysis of Variance Table
## 
## Model 1: mpg ~ transmission
## Model 2: mpg ~ cylinders + horsepower + weight + transmission
##   Res.Df RSS Df Sum of Sq    F  Pr(>F)    
## 1     30 721                              
## 2     26 151  4       570 24.5 1.7e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

[4] Residual Plots
Plot the residuals of the adjusted and optimized regression model:
plot of chunk unnamed-chunk-7

[5] Sources
A copy of the assignment can be found here: (https://github.com/bcaffo/courses/blob/master/07_RegressionModels/project/project.md)

A description of the mtcars Motor Trend Car Road Tests data set can be found here: (http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html).