When selecting a car, fuel efficiency is a common selection criteria. In this paper, we look at 1974 Motor Trend data [1] for the purpose of evaluating factors on fuel efficiency. We specifically are interested in the effects of automatic vs manual transmission on the gas mileage. By looking at several possible models, we see a relationship does exist between fuel efficiency and transmission type, but that could also be explained by other factors such as vehicle weight.
We begin by loading the data, and casting the transmission type and number of cylinders to factors.
data(mtcars)
mtcars$am <- factor(mtcars$am, levels=c(0,1), labels=c('Automatic', 'Manual'))
mtcars$cyl <- factor(mtcars$cyl)
First we can check that a relationship does exist with a simple linear model.
model1 <- lm(mpg ~ am, mtcars)
summary(model1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## amManual 7.244939 1.764422 4.106127 2.850207e-04
We can then see that a significant relationship does exist with a p-value < 0.001 when we do not consider other factors. Vehicles in our sample with a manual transmission got on average 7.2449393 more miles per gallon. We must see if other confounders could also explain the relationship.
We can then look at additional variables: weight, and number of cylinders.
model2 <- update(model1, mpg ~ am + wt)
model3 <- update(model1, mpg ~ am + wt + cyl)
anova(model1, model2, model3)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt
## Model 3: mpg ~ am + wt + cyl
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 29 278.32 1 442.58 65.3095 1.107e-08 ***
## 3 27 182.97 2 95.35 7.0353 0.003473 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that these are also very influential, and should be included in the model.
summary(model2)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
## amManual -0.02361522 1.5456453 -0.01527855 9.879146e-01
## wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
In this model with the vehicle weight added we see a high p-value for the manual transmission term. This would indicate that multicollinearity exists. Vehicle weight is correlated to both the transmission type and fuel efficiency. Also see figure 1 in the appendix for a plot of this relationship.
Looking at a residual plot, and the Q-Q plot of the residuals vs a normal we can see the studentized residuals are approximately normal.
par(mfcol = c(1,2))
plot(resid(model3) ~ predict(model3))
qqnorm(rstudent(model3))
qqline(rstudent(model3))
[1] Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391?411.
plot(mpg ~ wt, mtcars, col=am, xlab = 'Vehicle Weight')
abline(lm(mpg ~ wt, mtcars))
legend('topright', lty=c(1,1), col=1:2, legend = c('Automatic', 'Manual'))
plot(mpg ~ cyl, mtcars, col=am)
plot(mpg ~ hp, mtcars, col=am)
abline(lm(mpg ~ hp, mtcars))
legend('topright', lty=c(1,1), col=1:2, legend = c('Automatic', 'Manual'))