We used data from the 1974 edition of Motor Trend Magazine to examine the effect of manual and automatic transmission types on fuel efficiency (i.e. miles-per-gallon MPG). We are particularly interested in answering the following questions:
There is indeed a difference in fuel efficiency based on transmission type.
Nonetheless, we concluded that transmission type on its own is not a good predictor of MPG as the weight of the car, number of cylinders and horsepower are better predictors of fuel efficiency with an adjusted R-squared of 0.82. If added to the model, then the MPG difference for manual vehicles are much smaller.
# load the mtcars dataset
data(mtcars)
Qualitative variables such as number of cylinders, gears and carburetors are converted to factors variables
# convert qualitative data to factors
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
mtcars$carb <- factor(mtcars$carb)
A pair-wise scatterplot matrix (Appendix, Figure 1) was constructed to observe the correlation between **miles per gallon “MPG”“** and other variables of interest such as displacement “disp”, horsepower “hp”, cylinders “cyl”, rear axle ratio “draft”, weight “wt”, transmission “am”, V/S “vs”, etc
A box-and-whisker plot (Appendix, Figure 2) was produced to explore the relationship between manual and automatic transmission type on miles-per-gallon MPG and from here we see that there is an increase in MPG when the car transmission type is manual.
Here we built several regression models to find the best model. Analysis of Residuals is performed after model selection.
The stepwise selecion function is used to determine the best model. It does so by creating multple regression models with different variables and produces list of best predictors
# step wise selection function
best.model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
From the result presented in Figure 3 (see Appendix), we observe that the best model includes cyl6, cyl8, hp, wt, and amManual variables (overall p-value <0.001). The R-squared indicates that approximately 84% of the variance is explained by the regression model.
Also, by examining the output of this model, we observed that mpg decreases with respect to cylinders (-3.03 and -2.16 for cyl6 and cyl8, respectively), horsepower (-0.03), and weight (for every 1,000lb, by -2.5), while mpg increases with having a manual transmission (by 1.8).
The Residuals are plotted in the Appendix, Figure 4. From the residual plot, we observe that:
The t-test output shown below shows that the difference between manual and automatic transmission is statistically significant with p-value < 0.05.
t.test(mpg ~ am, data = mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.15 24.39
From the result obtained, we can state that cars with manual transmission have a better fuel efficiency (mpg) than cars with automatic transmission.
Although, transmission type on its own is not a good predictor of MPG as we saw that weight of the car, number of cylinders and horsepower are good predictors of fuel efficiency (see best.model summary above).
Therefore, we conclude that transmission type alone is not a good predictor of MPG
fit1 <- lm(mpg ~ am, data = mtcars)
# pair-wise scatterplot
pairs(mtcars, panel = panel.smooth, main = "Pairwise plot of mtcars data")
# boxplot
boxplot(mpg ~ am, data = mtcars,
xlab = "Transmission type", ylab = "Miles per gallon",
main = "MPG vs Transmission", col = c("green", "purple"),
names = c("Automatic", "Manual"))
best.model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
summary(best.model)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.9404 7.733e-13
## cyl6 -3.03134 1.40728 -2.1540 4.068e-02
## cyl8 -2.16368 2.28425 -0.9472 3.523e-01
## hp -0.03211 0.01369 -2.3450 2.693e-02
## wt -2.49683 0.88559 -2.8194 9.081e-03
## amManual 1.80921 1.39630 1.2957 2.065e-01
# residual plot
par(mfrow=c(2, 2))
plot(best.model)