October 26, 2014
======================================================================================================================== ### Executive Summary
This study focuses on a regression analysis that explores the relationship between a set of variables within the mtcars dataset. The biggest focus is to answer the following questions:
Based on the analysis, the remainder of this document will explain how and why I conclude the following solutions to the above questions.
Yes, a manual transmission is better for MPG than automatic transmissions. This is because on average, a manual transmission can consume 7.24 gallons more fuel than an automatic transmission:
am avg_mpg stdev_mpg
manual 24.39231 6.166504
automatic 17.14737 3.833966
To adjust for other confounding variables such as the weight and horsepower of the car, multivariate regression helps to better estimate the impact of transmission type on MPG. Using an ANOVA regression model, the results reveal that manual transmission cars get 2.084 miles per gallon more than automatic transmission cars.
======================================================================================================================== ### Data Processing
require(plyr)
## Loading required package: plyr
require(ggplot2)
## Loading required package: ggplot2
library(datasets)
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The variables present in the dataset are:
* mpg - Miles/(US) gallon
* cyl - Number of cylinders
* disp - Displacement (cu.in.)
* hp - Gross horsepower
* drat - Rear axle ratio
* wt - Weight (lb/1000)
* qsec - 1/4 mile time
* vs - V/S
* am - Transmission (0 = automatic, 1 = manual)
* gear - Number of forward gears
* carb - Number of carburators
The dataset currently represents the transmission type as a 1 or a 0. For readibility sake, we will update the variable to be manual versus automatic and change it to a factor variable.
mtcars$am.factor <- relevel(factor(c('manual', 'automatic'))[2 - mtcars$am], 'manual')
========================================================================================================================
To answer this question, lets break this information down to show the average MPG and standard deviation MPG for each. This helps us to determine whether or not there is a noticable difference in fuel efficiency between the transmission types.
data.frame(am=levels(mtcars$am.factor),
avg_mpg=aggregate(mtcars$mpg,
by=list(mtcars$am.factor), mean)$x,
stdev_mpg=aggregate(mtcars$mpg, by=list(mtcars$am.factor), sd)$x)
## am avg_mpg stdev_mpg
## 1 manual 24.39 6.167
## 2 automatic 17.15 3.834
On average, this states that a manual transmission is better for MPG than automatic transmissions because a manual transmission can consume 7.24 gallons more fuel than an automatic transmission.
A simple regression model for MPG with a single predictor of AM
mpg.am <- lm(mpg~am, data=mtcars)
summary(mpg.am)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## am 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
Interpreting:
* Intercept = 24.392 which represents the manual cars mean MPG
* AM Coefficient = 7.244 which represents the difference between the manual transmission MPG and the automatic transmission MPG.
* R-squared value = 0.3598 which means that our model only explains 35.98% of the variance.
Please see the Appendix - Figure 1: MPG by Transmission Type to see a visual representation.
========================================================================================================================
To answer this question, lets take a look at the correlation between MPG and transmission types.
cor(mtcars$am, mtcars$mpg)
## [1] 0.5998
The correlation value = 0.5998 which shows a significant positive correlation.
Since we have two models of the same data, we can use ANOVA to compare the two models to see if there is a significant difference.
bestfit <- lm(mpg~am + wt + hp, data = mtcars)
anova(mpg.am, bestfit)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 28 180 2 541 42 3.7e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting:
* The P-Value = 3.745e-09 which causes us to reject the null hypothesis and claim that our Anova model is significantly different from our simple model.
Final Model
summary(bestfit)
##
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.422 -1.792 -0.379 1.225 5.532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.00288 2.64266 12.87 2.8e-13 ***
## am 2.08371 1.37642 1.51 0.14127
## wt -2.87858 0.90497 -3.18 0.00357 **
## hp -0.03748 0.00961 -3.90 0.00055 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.54 on 28 degrees of freedom
## Multiple R-squared: 0.84, Adjusted R-squared: 0.823
## F-statistic: 49 on 3 and 28 DF, p-value: 2.91e-11
Interpreting:
* The R-squared = 83.99% of the variance.
* Estimate am = 2.083710 which tells us that manual transmission cars do have more MPG than automatic transmission cars.
* Estimate Weight (wt) and Horse Power (hp) do indeed confound the relationship between am and mpg (mostly wt).
Please see the Appendix - Figure 2: Residuals vs. Fitted Values. to see that the values are normally distributed and homoskedastic.
========================================================================================================================
boxplot(mpg~am.factor, data = mtcars,
col = c("red", "blue"),
main = "MPG by Transmission Type",
xlab = "Transmission",
ylab = "Miles per Gallon")
The manual transmission (red box) shows the average MPG to be 24.39 and the automatic transmission (blue box) shows the average MPG to be 17.15.
It is important to check the residuals for any signs of non-normality and examine the residuals vs. fitted values plot to spot for any signs of heteroskedasticity.
par(mfrow = c(2,2))
plot(bestfit)
The graph shows that the residuals are normally distributed and homoskedastic.