We find that manual transmission cars get better miles per gallon on average than automatic transmission cars.
Our final regression model, selected by stepwise AIC, suggests that the best fit equation is: mpg ~ as.factor(am) + wt + qsec - 1 where the expected increase in mpg for a manual transmission car is 12.554 compared to the lower expected increase in mpg for an automatic transmission car which is 9.618, holding all other variables constant respectively.
Furthermore, our model finds that the expected decrease in mpg for a 1000 lb increase in car weight is -3.917, and the expected increase in mpg for an increase in 1 second to the seconds needed for a car to accelerate a quarter mile is 1.226, holding all other variables constant respectively.
Our exploratory analysis and charting reveals that, on average, manual transmission cars have a higher mpg and thus get better gas mileage. However, looking at correlation of all other variables in the dataset to mpg (see appendix), we see there are high correlations for variables other than transmission which leads us to believe other variables will need to be accounted for when determining which transmission type gets the best fuel efficiency.
For posterity, we examine the results of the simple linear model with mpg as the outcome and am (transmission type) as the predictor.
fit <- lm(mpg ~ as.factor(am) - 1, data = mtcars)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## as.factor(am)0 17.14737 1.124603 15.24749 1.133983e-15
## as.factor(am)1 24.39231 1.359578 17.94109 1.376283e-17
From the results, we can see that while the model has a very high R^2 value of 0.949 and adj R^2 (accounting for number of variables: 0.945), we must still account for additional variables given the high correlations we observed in our exploratory analysis. Since a basic linear model will not suffice, we expand to a multivariate case and use stepwise model selection by AIC to find the best combination of the available variables to use.
fitAll <- lm(mpg ~ as.factor(am) + wt + cyl + disp + hp + drat + vs + gear + qsec + carb - 1,
data = mtcars)
modelSelection <- stepAIC(fitAll, direction = c("backward"))
## lm(formula = mpg ~ as.factor(am) + wt + qsec - 1, data = mtcars)
## Estimate Std. Error t value Pr(>|t|)
## as.factor(am)0 9.617781 6.9595930 1.381946 1.779152e-01
## as.factor(am)1 12.553618 6.0573391 2.072464 4.754335e-02
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
Our result, selected by stepwise AIC, suggest that the best fit equation is: mpg ~ as.factor(am) + wt + qsec - 1 and we see from the below summary that our new model has an even higher R^2 value of 0.988 compared to our single variable model R^2 value of 0.949. Furthermore, our new model’s standard error has reduced to 2.459 compared to our single variable model std error of 4.902.
We conclude that manual transmission vehicles have a better mpg on average and are thus more fuel efficient than automatic transmission vehicles. We assume that the best explanatory variables are included by the stepwise AIC selection method; however, we are dealing with a small dataset, additional variables or more observations in either group (automatic vs. manual) could change the model selection.
Our Residuals vs. Fitted plot (see appendix) shows there aren’t many aspects of model departures. Our QQ plot evaluates the normality of our error terms and doesn’t look off. Furthermore, increased leverage of outlier points on our Residuals vs. Leverage doesn’t seem commensurate with abnormally high quantity residual values.
Our model shows the variables wt and qsec are much more significant than either level for the transmission factor variable am. However, our simple linear model awards very high significance to am factor levels (auto & manual). Thus, in terms of purely explaining the outcome mpg, transmission as a factor is quite explanatory and significant; however, the high variance of the binary am transmission variable and high correlative relationship of other variables in the dataset to mpg leads us to our final best model.
Correlation coefficients comparing all other variables in the mtcars dataset to mpg shows highly inverse correlations for disp, cyl, and weight to mpg.
Correlation Matrix
anova(modelSelection)
## Analysis of Variance Table
##
## Response: mpg
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(am) 2 13321.4 6660.7 1101.685 < 2.2e-16 ***
## wt 1 442.6 442.6 73.203 2.673e-09 ***
## qsec 1 109.0 109.0 18.034 0.0002162 ***
## Residuals 28 169.3 6.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1