The selected model shows an estimated coefficient of 1.4780477 which means that there are an increase of about 1.478 in MPG if the car has a manual transmission adjusting for variables. Therefore, to formaly answer the stated question, the manual transmission has a better MPG compared to an automatic one of about 1.478 miles per galon on average after adjusting for other confounding variables.
The main goal of this analysis is to answer the question “Is an automatic or manual transmission better for MPG?”, as well as give quantitative details on the differences on MPG between the two types of transmission, or more concretly, quantify the MPG difference between automatic and manual transmissions.
The database is called mtcars and it was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Let’s begin by taking a quick look in our dataset.
Before we do any more advanced analysis, let’s just look at the boxplot of the MPG of the two types of transmissions (See Appendix A), and the averages of the MPG on both types of transmission (See Appendix B).
Notice that the MPG median of manual transmission cars are higher than the automatic transmission cars (Appendix A); however, many other aspects of this analysis were left out, such as number of cylinders, horse power, and weight. Having said that, let’s try a linear model to analyse the problem, but still only looking at the influence of the type of transmission on the fuel consumption (MPG).
data("mtcars")
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("AT", "MT")
fit <- lm(mpg ~ am, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amMT 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Notice that the trasmission (am) is statisticaly significant in the relationship with the MPG as stateted by Pr(>|t|) 1.13e-15, which is less than the typical benchmarks (0.05, for example). Also, the expected change in mpg going from automatic transmission to manual transmission is of 7.245 MPGs.
To summarize the coeficients, we have:
However, intuitively we know that other predictors can also influence MPG besids the type of transmission (e.g. weight of the vehicle (wt) and number of cylinders (cyl)). Let’s look at other variables that could be added to the model.
By analizing the sqrt of VIF, we can see that displacement adds a lot of variance due to its correlation with cylinder for example. So, we can probable remove displacement from the predictors pool without hurting the model, as well as other predictors such as carb, vs, and drat which doesn’t seem to have a lot of influence in the result based on the correlation analysis.
data("mtcars")
require(car)
## Loading required package: car
fit <- lm(mpg ~ ., data = mtcars)
sqrt(vif(fit));
## cyl disp hp drat wt qsec vs am
## 3.920948 4.649757 3.135608 1.837014 3.894212 2.743712 2.228424 2.156035
## gear carb
## 2.314617 2.812249
cor(mtcars)[1,]
## mpg cyl disp hp drat wt
## 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594
## qsec vs am gear carb
## 0.4186840 0.6640389 0.5998324 0.4802848 -0.5509251
Using the remaining variables, we can try some combinations of predictors and use the ANOVA test to verify which one generates a reasonable model.
# Nested model testing
fit1 <- lm(mpg ~ am, data = mtcars)
fit2 <- update(fit1, mpg ~ am + cyl, data = mtcars)
fit3 <- update(fit2, mpg ~ am + cyl + hp, data = mtcars)
fit4 <- update(fit3, mpg ~ am + cyl + hp + wt, data = mtcars)
fit5 <- update(fit4, mpg ~ am + cyl + hp + wt + qsec, data = mtcars)
anova(fit1, fit2, fit3, fit4, fit5)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl
## Model 3: mpg ~ am + cyl + hp
## Model 4: mpg ~ am + cyl + hp + wt
## Model 5: mpg ~ am + cyl + hp + wt + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 29 271.36 1 449.53 73.1328 4.952e-09 ***
## 3 28 220.55 1 50.81 8.2659 0.007954 **
## 4 27 170.00 1 50.56 8.2246 0.008091 **
## 5 26 159.82 1 10.18 1.6562 0.209459
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 4 seems to be a good model since we have a low F value and it is statistically significant as shown by Pr(>F). Also, it explains about 85% (or 82% if using the adjusted R-squared) of the variance. Now lets look at the model residuals and test for normality. One way to see that, is to look at the plot on the upper right corner (Normal Q-Q). If the residuals fall roughly on a line of the normal QQ plot, it is a good sign. Also, the Residual vs Fitted plot may not have any pattern (heteroskedasticity - non constant variance)
summary(fit4)
##
## Call:
## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4765 -1.8471 -0.5544 1.2758 5.6608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.14654 3.10478 11.642 4.94e-12 ***
## am 1.47805 1.44115 1.026 0.3142
## cyl -0.74516 0.58279 -1.279 0.2119
## hp -0.02495 0.01365 -1.828 0.0786 .
## wt -2.60648 0.91984 -2.834 0.0086 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared: 0.849, Adjusted R-squared: 0.8267
## F-statistic: 37.96 on 4 and 27 DF, p-value: 1.025e-10
library(ggplot2)
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("AT", "MT")
qplot(am, mpg, data = mtcars, geom = "boxplot", xlab = 'Transmission Type', ylab = "MPG")
qplot(mpg, data = mtcars, facets = am ~ ., binwidth = 3)
par(mfrow = c(2,2))
plot(fit4)