In this report, the relationship between a set of variables and miles per gallon (MPG) (outcome) was explored. Two interesting questions were answered:

  1. “Is an automatic or manual transmission better for MPG”
  2. “Quantifying how different is the MPG between automatic and manual transmissions?”

The packages needed are included:

library(datasets)
require(stats)
require(graphics)
require(car)
## Loading required package: car
require(MASS)
## Loading required package: MASS
require(knitr)
## Loading required package: knitr
require(markdown)
## Loading required package: markdown

read in data and perform exploaratory analysis:

data(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
pairs(mtcars, panel=panel.smooth, main="MTCars Data")

plot of chunk unnamed-chunk-2

fit<-glm(mpg~as.factor(cyl) + as.factor(vs) + as.factor(am) + as.factor(gear) + as.factor(carb) + disp + hp + drat + wt + qsec, data=mtcars)
#anova(fit)

Based on the pair graphs between outcome and predicators, variables like cyl, disp, hp, drat, wt, vs, and am seem highly correlated to mpg. We built the full model, and perfomed stepwise model selection to select significant predictors for the final model analysis.

step <- stepAIC(fit, direction="both")
step$anova 

Four variables: cyl, am, hp and wt were include in the final model.

fit<-glm(mpg ~ as.factor(cyl) + as.factor(am) + hp + wt, data=mtcars)
summary(fit)
## 
## Call:
## glm(formula = mpg ~ as.factor(cyl) + as.factor(am) + hp + wt, 
##     data = mtcars)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.939  -1.256  -0.401   1.125   5.051  
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      33.7083     2.6049   12.94  7.7e-13 ***
## as.factor(cyl)6  -3.0313     1.4073   -2.15   0.0407 *  
## as.factor(cyl)8  -2.1637     2.2843   -0.95   0.3523    
## as.factor(am)1    1.8092     1.3963    1.30   0.2065    
## hp               -0.0321     0.0137   -2.35   0.0269 *  
## wt               -2.4968     0.8856   -2.82   0.0091 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 5.809)
## 
##     Null deviance: 1126.05  on 31  degrees of freedom
## Residual deviance:  151.03  on 26  degrees of freedom
## AIC: 154.5
## 
## Number of Fisher Scoring iterations: 2

The residuals plots showed this final model is descent model to explore the relationship betweem mpg and predictors.

layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page 
plot(fit)

plot of chunk unnamed-chunk-5

Based on the final model fitting results, we can conclude:

  1. wt increases per 1000lb, the mpg will decrease by 2.5(adjusted by hp, cyl, and am)
  2. mpg will decrease slighly with hp increase.
  3. If cly increase from 4 to 6 to 8, mpg will decrease by 3 and 2.2, respectively(adjusted by hp, wt, and am).
  4. Automatic transmission has higher mpg(1.8 of increase adjusted by wt, hp, and cyl) compared to manual tranmission.