In this report, the relationship between a set of variables and miles per gallon (MPG) (outcome) was explored. Two interesting questions were answered:
The packages needed are included:
library(datasets)
require(stats)
require(graphics)
require(car)
## Loading required package: car
require(MASS)
## Loading required package: MASS
require(knitr)
## Loading required package: knitr
require(markdown)
## Loading required package: markdown
read in data and perform exploaratory analysis:
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
pairs(mtcars, panel=panel.smooth, main="MTCars Data")
fit<-glm(mpg~as.factor(cyl) + as.factor(vs) + as.factor(am) + as.factor(gear) + as.factor(carb) + disp + hp + drat + wt + qsec, data=mtcars)
#anova(fit)
Based on the pair graphs between outcome and predicators, variables like cyl, disp, hp, drat, wt, vs, and am seem highly correlated to mpg. We built the full model, and perfomed stepwise model selection to select significant predictors for the final model analysis.
step <- stepAIC(fit, direction="both")
step$anova
Four variables: cyl, am, hp and wt were include in the final model.
fit<-glm(mpg ~ as.factor(cyl) + as.factor(am) + hp + wt, data=mtcars)
summary(fit)
##
## Call:
## glm(formula = mpg ~ as.factor(cyl) + as.factor(am) + hp + wt,
## data = mtcars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.939 -1.256 -0.401 1.125 5.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.7083 2.6049 12.94 7.7e-13 ***
## as.factor(cyl)6 -3.0313 1.4073 -2.15 0.0407 *
## as.factor(cyl)8 -2.1637 2.2843 -0.95 0.3523
## as.factor(am)1 1.8092 1.3963 1.30 0.2065
## hp -0.0321 0.0137 -2.35 0.0269 *
## wt -2.4968 0.8856 -2.82 0.0091 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 5.809)
##
## Null deviance: 1126.05 on 31 degrees of freedom
## Residual deviance: 151.03 on 26 degrees of freedom
## AIC: 154.5
##
## Number of Fisher Scoring iterations: 2
The residuals plots showed this final model is descent model to explore the relationship betweem mpg and predictors.
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(fit)
Based on the final model fitting results, we can conclude: