Executive Summary

Motor Trend, a magazine about the automobile industry is interested in the relationship between a set of variables and miles per gallon (MPG) (outcome). They want answers to the following two questions which will be addressed in this report:

Using hypothesis testing, simple and multivariate linear regression, we determined there is a difference between the mean mpg for automatic and manual transmission cars. The manual cars get 7.24 more mpg on average. We adjusted the model to include other confounding variables such as the weight and horsepower of the car to determine the full impact. A comparative analysis and validation of our two models was performed using ANOVA. The final results from the multivariate regression show that the manual transmission cars average 2.084 more mpg than the automatic.

Exploratory Data Analysis

Load the dataset and examine using the Pairs function which plots a thumbnail scatterplot for every pair of variables (Figure A-1). We see 11 statistics were recorded about each model of car.

data(mtcars); names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

The predictor variable for the transmision type, am, is a numeric. Convert it to a factor and label the levels as automatic and manual.

mtcars$am <- as.factor(mtcars$am); levels(mtcars$am) <- c("Automatic", "Manual"); head(mtcars,n=3)
##                mpg cyl disp  hp drat    wt  qsec vs     am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0 Manual    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0 Manual    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1 Manual    4    1

Regression Analysis

Determine the effects of car’s transmission type on mpg. Look at the distribution of mpg for each level of am (Automatic or Manual) by plotting a box plot (Figure A-2). This plot shows that manual transmissions tend to have higher mpg. The data also confirms the mean mpg of manual transmission cars is 7.24 mpg higher than the automatic transmission cars.

aggregate(mpg~am, data = mtcars, mean)
##          am   mpg
## 1 Automatic 17.15
## 2    Manual 24.39

Determine the predictors for the model by examining the mpg row in the correlation matrix.

data(mtcars); sort(cor(mtcars)[1,])
##      wt     cyl    disp      hp    carb    qsec    gear      am      vs 
## -0.8677 -0.8522 -0.8476 -0.7762 -0.5509  0.4187  0.4803  0.5998  0.6640 
##    drat     mpg 
##  0.6812  1.0000

It appears that wt, cyl, disp, and hp are highly correlated with our dependent variable mpg and may be good candidates for the model. This makes sense as we know that heavier cars with more horsepower tend to have lower mpg.

Linear Regression

fit <- lm(mpg~am, data = mtcars); summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.392 -3.092 -0.297  3.244  9.508 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    17.15       1.12   15.25  1.1e-15 ***
## am              7.24       1.76    4.11  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared:  0.36,   Adjusted R-squared:  0.338 
## F-statistic: 16.9 on 1 and 30 DF,  p-value: 0.000285

This is the hypothesis test for the model. From the the coefficient and intercepts, we see that automatic cars get 17.15 mpg while those with a manual transmission get 7.24 more miles per gallon. We also see that the R-Squared value is 0.36 which means the model only explains 36% of the variance.

Multivariate Linear Regression

Now, we will fit a multivariate linear regression for mpg on am with the additional predictors of wt and hp. We will analyze the variance with ANOVA to determine the differences.

multifit <- lm(mpg~am + wt + hp, data = mtcars); anova(fit, multifit)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp
##   Res.Df RSS Df Sum of Sq  F  Pr(>F)    
## 1     30 721                            
## 2     28 180  2       541 42 3.7e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will use 0.05 as the type I error rate significance benchmark. The second model has a very small p-value of 3.7e-09, so we can reject the null hypothesis and note the difference between the intial and the multivariate model.

Residual Plot

We will use a residual plot to validate the model (Figure A-3). And, we will also examine the estimates from the multivariate model.

summary(multifit)
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.422 -1.792 -0.379  1.225  5.532 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.00288    2.64266   12.87  2.8e-13 ***
## am           2.08371    1.37642    1.51  0.14127    
## wt          -2.87858    0.90497   -3.18  0.00357 ** 
## hp          -0.03748    0.00961   -3.90  0.00055 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.54 on 28 degrees of freedom
## Multiple R-squared:  0.84,   Adjusted R-squared:  0.823 
## F-statistic:   49 on 3 and 28 DF,  p-value: 2.91e-11

The second model explains 84% of the variance as indicated by R-squared. We see that wt and hp do confound the relationship between am and mpg. The coefficient for am shows that the manual transmission cars have 2.084 more mpg than the automatic.

Appendix A - Supporting Figures

Figure A-1 Pairs Scatterplot

plot of chunk unnamed-chunk-8

Figure A-2 Distribution of miles per gallon versus transmission type

plot of chunk unnamed-chunk-9

Figure A-3 Residual Model

plot of chunk unnamed-chunk-10