Summary

This analysis investigates a data set from Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, Motor Trend (MT) we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). MT is particularly interested in the following two questions:

  1. Is an automatic or manual transmission better for MPG

  2. Quantify the MPG difference between automatic and manual transmissions

The approached used throughout this analysis is based on general linear models. More specifically, multiple linear regression will be used to determine the extent to which independent variables contribute to MPG. I will also determine if there is a statistically significant difference between automatic and manual transmissions with respect to MPG.

Data Preprocessing

First, load the dataset and convert variables to factors.

data(mtcars)
mtcars$cyl  <- factor(mtcars$cyl)
mtcars$vs   <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am   <- factor(mtcars$am,labels=c("Auto","Manual"))

Analysis

Next, I will build an initial model that includes all predictor variables of MPG.

model <- lm(mpg ~ ., data=mtcars)
bestfit_model <- step(model, direction="both")

Here is a summary of the best model.

summary(bestfit_model)
## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## amManual     1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

The adjusted r-squared value tells us that 84% of the variation in MPG is due to the variables in the above model. (cyl6, cyl8, hp, wt, and amManual) In other words, this model explaints about 84% of the variance of the MPG variable.

Next, we can compare this first model to a model that examines only MPG vs am (automatic vs. manual).

model2 <- lm(mpg ~ am, data=mtcars)
summary(model2)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

In this particular model, I looked at a simple linear regression using am as the predictor variable and mpg as the outcome variable. Essentially, I am examining the effect that the automatic vs. manual transmission has on mpg (miles per gallon).

The results indicate that the automatic vs. manual has a statistically significant effect on mpg. The model also explains that the transmission type (auto vs. manual) accounts for about 34% of the variability of MPG. This basically tells us that other factors contribute to MPG besides transmission type.

We can see that both models are statistically significant, but the best model above demonstrates that other factors aside from the transmission type are important factors in determining MPG.

a <- anova(bestfit_model, model2)
a
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + hp + wt + am
## Model 2: mpg ~ am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     26 151.03                                  
## 2     30 720.90 -4   -569.87 24.527 1.688e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on these results, the p-value is highly significant with p < .001, indicating that the two models are statistically different.

Diagnostics and Plots

par(mfrow = c(2,2))
plot(bestfit_model)

The Q-Q plot above shows that the data is approximately normal, and no multicollinearity exists based on examining the residuals being fairly randomly dispersed on the graph.

Conclusions

From the above analysis, we can conclude:

Thus, answers to the initial questions are as follows.

  1. Yes, manual transmissions tend to get more MPG than automatical transmissions.

  2. Manual transmissions get approximately 1.8MPG more than automatic transmissions, taking into consideration other factors such as # of cylinders, horsepower, and weight of the car.

Appendix

  1. Comparison of automatic and manual transmissions via boxplot.
boxplot(mpg ~ am, data=mtcars)

  1. Correlations between variables (there shouldn’t really be any, to avoid multicollinearity problem)
mdata <- mtcars[, c(2,4, 7, 9)]   # MPG, WT, QSEC, and AM
par(mar=c(1,1,1,1))
pairs(mdata, panel = panel.smooth, col = 9 + mtcars$wt)