This project explores the mtcars dataset that comes with R. It attempts to answer two questions. Is an automatic or manual transmission better for MPG? How does one quantify the MPG difference between automatic and manual transmission? It was concluded that a manual transmission gets on average 2.9 more miles per gallon than an automatic transmission, when the variables weight and qsec value (the time it takes to travel 1/4 mile) are accounted for.
The mean and median miles per gallon for cars with automatic transmission were 17.15 and 17.3, while for manual transmission they were 24.39 and 22.8, indicating that a manual transmission tends to result in higher miles per gallon. See Figure 1 for a visual representation of the data.
A two sided t-test indicates that the means are not different by chance.
t.test(mpg ~ transmission, data=mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by transmission
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group automatic mean in group manual
## 17.14737 24.39231
fit <- lm(mpg ~ transmission, mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ transmission, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## transmissionmanual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The summary of the basic regression model indicates a mean mpg difference of about 7. Unfortunately the adjusted R-squared for the model is fairly low, indicating that only about 33% of the variance can be explained by the model.
Stepwise regression was used to reduce the model (starting with all possible variables) to one where all remaining predictor variables are significant. This produced a model that includes the variables weight, qsec (the time it takes to travel 1/4 mile), and transmission type.
The summary of this model reports an adjusted R-squared of about .83. This is a good sign. We can further investigate this regression model by taking a look at it’s residuals (see Figure 2). The plot in the upper left of Figure 2 is a standard residual plot, showing residuals against fitted values. The residuals appear random, which indicates that our linear model may be appropriate. The plot in the upper right is a normal quantile plot of the residuals. A perfectly linear plot would indicate a normal distribution of residuals. In this case the plot is somewhat close, but not perfectly linear. This is a bit concerning. The plot in the bottom left (Scale-Location plot) shows the square roots of the absolute values of the residuals against the fitted values. The line is sloped, indicating heteroscedasticity. This is not ideal. The last plot (bottom right) labels outliers we may want to investigate as possibly having undue influence on the regression relationship. If we try removing these outliers, we may have more insight into the quality of our model.
We can use Cook’s Distance to decide which outliers to attempt removing. In this case values were ordered according to Cook’s Distance and the three highest were selected. The plots in Figure 3 can shed some light on whether removing the outliers made a difference. The upper right plot (normal quantile plot) now shows a normal distribution of residuals. The Scale-Location plot, while not flat, has a smaller slope than before. This indicates a move towards greater homoscedasticity. Given that we seem to be moving in the right direction, we can now feel a bit more confident about our model.
Our chosen model indicates that a manual transmission gets on average 2.9 more miles per gallon than an automatic transmission, when weight and qsec value are accounted for.
Figure 1
Figure 2
Figure 3