The purpose of analysis is to explore whether Manual or Auto Transmission provide better car mileage (MPG). After several iterations of linear regression models, it was unclear whether Manual or Auto Transmission is better when it comes to car mileage, due to lack of statistical significance for the variable. The following report explores some of the high level analysis as well as several iterations of linear regression models. Please see the Appenxix for full Model Summary and diagnostics.
#Simple summary
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.42 19.20 20.09 22.80 33.90
#Correlation between mpg and other vars
cor(mtcars, mtcars$mpg)[-1,]
## cyl disp hp drat wt qsec
## -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.4186840
## vs am gear carb
## 0.6640389 0.5998324 0.4802848 -0.5509251
#We can see that there are other factors that are strongly correlated to MPG
For our basel, we only include Transmission type as a factor.
NOTE: Due to restrictions of report length, Model Summary and Diagnostics are placed in the Appendix*
#Fit a model using only Transmission type
fit_am<-lm(mpg~factor(am),data=mtcars)
print(fit_am)
##
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
##
## Coefficients:
## (Intercept) factor(am)1
## 17.147 7.245
The model summary shows that Transmission type (am) is highly significant. That is, manual transmission has a mean that is 7.25 higher than automatic. However, the R^2 is 0.36, which is likely a poor fit.
Next we perform a stepwise model to compete against the base model.
#First fit regression as step-wise
# Stepwise Regression
library(MASS)
fit <- lm(mpg ~ as.factor(am) + as.factor(cyl) + as.factor(vs) + as.factor(gear) + as.factor(carb) + disp + hp + drat + wt + qsec,
data=mtcars)
step <- stepAIC(fit, direction="both")
step$anova # display results
#Fit the final model from the stepwise regression
fit_final <- lm(mpg~ as.factor(am) + as.factor(cyl) + hp + wt, data=mtcars)
print(fit_final)
##
## Call:
## lm(formula = mpg ~ as.factor(am) + as.factor(cyl) + hp + wt,
## data = mtcars)
##
## Coefficients:
## (Intercept) as.factor(am)1 as.factor(cyl)6 as.factor(cyl)8
## 33.70832 1.80921 -3.03134 -2.16368
## hp wt
## -0.03211 -2.49683
Based on this model, Transmission type is not significant, however the model has a much better fit with an R^2 of 0.84. For purposes of this report, only the variables with statistical significance will be interpreted. Num Cylinder=6 has a decrease in mean MPG of -3 compared to Num Cylinder=4, holding all else constant. For every unit increase in Gross Horsepower (hp), MPG decreases by 0.03, holding all else constant. For every increase in Weight (lb/1000), MPG decreases by 2.5, holding all else constant. The coefficients for Horsepower and Weight are most intuitive; that is mileage efficiency (as measured by MPG) is sacrificed as the Horsepower and Weight of a vehicle increases.
Here we fit another model by excluding Number of Cylinders (cyl)
fit_final2 <- lm(mpg~ as.factor(am) + hp + wt, data=mtcars)
print(fit_final2)
##
## Call:
## lm(formula = mpg ~ as.factor(am) + hp + wt, data = mtcars)
##
## Coefficients:
## (Intercept) as.factor(am)1 hp wt
## 34.00288 2.08371 -0.03748 -2.87858
In this competing model, the fit is strong yet Transmission type is still not signifcant. It is therefore not safe to interpret the coefficient of Transmission type. However, from the base model, we can say that there is a correlation between Transmission type and MPG.
Base Model
summary(fit_am)
##
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## factor(am)1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
par(mfrow=c(2,2))
plot(fit_am)
Challenger Model1
summary(fit_final)
##
## Call:
## lm(formula = mpg ~ as.factor(am) + as.factor(cyl) + hp + wt,
## data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## as.factor(am)1 1.80921 1.39630 1.296 0.20646
## as.factor(cyl)6 -3.03134 1.40728 -2.154 0.04068 *
## as.factor(cyl)8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
par(mfrow=c(2,2))
plot(fit_final)
Challenger Model2
summary(fit_final2)
##
## Call:
## lm(formula = mpg ~ as.factor(am) + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## as.factor(am)1 2.083710 1.376420 1.514 0.141268
## hp -0.037479 0.009605 -3.902 0.000546 ***
## wt -2.878575 0.904971 -3.181 0.003574 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
par(mfrow=c(2,2))
plot(fit_final2)