Transmission Effections on Miles Per Gallon

Exploratory Analysis - Data Cleanup

Here we load the data set and set Transmission, number of cylinders, number of forward gears, and number of carnuretors to factors. Finally we will use the plyr revalue function to change the value of transmission to Automatic and Manual. The plots of the exploratory analysis are available in the Appendix.

library(dplyr)
mtcars <- mtcars
mtcars$am <- factor(mtcars$am)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
mtcars$car <- factor(mtcars$carb)

mtcars$am <- plyr::revalue(mtcars$am, c("0"="Automatic","1"="Manual"))

Statistical Inference

We can infer from the t-test that the mean miles per gallon for Automatic and Manual transmissions are not equivalent thus we will accept the alternative hypothesis.

t.test(mpg ~ am, data = mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Model Building

The first model will test miles per gallon to transmission type since our t-test was statistically significant. The final model uses all variables in the dataset and uses the R stepwise algorith to determine the best fit of all variables in the dataset. The stepwise search will be performed in both directions for the optimum model.

Model1 <- lm(mpg ~ am, data = mtcars)
Model2 <- lm(mpg ~., data = mtcars)
Model3 <- step(Model2,direction = "both")

Interpreting Coefficients and R Squared for MPG ~ Transmission

This model, which is the same test from the Statistical Inference section of the report shows again that Manual transmission increases miles per gallon by about 7 extra miles per gallon. However, the model only explains about 36% of the variance.

summary(Model1)$coefficients[,1:4]
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## amManual     7.244939   1.764422  4.106127 2.850207e-04
summary(Model1)$r.squared
## [1] 0.3597989

Interpreting Coefficients and R Squared for the Optimum Stepwise Model

From the coefficient output of this model we can make the following assumptions: 6 Cylinder engines decrease MPG by about 3 miles per gallon, 9 Cylinder engines decrease MPG by about 2 miles per gallon, for every increase of 10 units of horsepower miles per gallon decreases by 0.32, for every 1000 pound increase in weight miles per gallon decreases by a factor of 2.5, and finally with the added variables Manual transmissions increase MPG by 1.8. The model explains 86.5% of the variance and thus this is in fact optimum for predicting miles per gallon.

summary(Model3)$coefficients[,1:4]
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 33.70832390 2.60488618 12.940421 7.733392e-13
## cyl6        -3.03134449 1.40728351 -2.154040 4.068272e-02
## cyl8        -2.16367532 2.28425172 -0.947214 3.522509e-01
## hp          -0.03210943 0.01369257 -2.345025 2.693461e-02
## wt          -2.49682942 0.88558779 -2.819404 9.081408e-03
## amManual     1.80921138 1.39630450  1.295714 2.064597e-01
summary(Model3)$r.squared
## [1] 0.8658799

The below analysis of variance shows that by adding number of engine cylinders, horsepower, and car weight our model more accurate. The low p-value shows the statistical significance.

anova <- anova(Model1,Model3)
anova$"Pr(>F)"
## [1]           NA 1.688435e-08

Model Diagnostics

These plots examine the leverage certain cars have on the optimum model presented in this analysis. The residuals vs fitted & scale location plots both are randomly dispersed which is optimum for model building. The normal Q-Q and Residuals Vs. Leverage plots show the outliers in the model as the Toyota Corrolla, Fiat 12B, and Chrysler Imperial.

Appendix

Pair Plot for the Motor Trend Cars Data Set

Box Plot for the T-Test Earlier in the report.