This analysis is based on the mtcars data set that comes with the programming language R.
The data comes from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Based on the analysis of this data, one finds that the type of transmission (automatic vs. manual) is not associated with fuel efficiency (miles per gallon) after accounting for the fact that manual transmission cars tend to be lighter and thus are likely more efficient for that reason rather than because of the transmission.
Upon loading R, the mtcars data set is already available by default.
Check unique values per column to see which variables in mtcars may be factors vs. continuous.
apply(mtcars,2,function(x)length(unique(x)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
Make a new data frame with am converted to factor. This code will be in appendix, along with any other code not shown in the main document.
For the numeric variables not mpg, check which may be confounded with am.
Variables disp, drat, and wt are significantly associated with automatic vs. manual transmission. Are any of these associated with mpg? Let’s make a plot and see.
The correlation is much better for disp and wt than for drat.
Weight is particularly interesting since the disparity in weight between automatic and manual transmissions was also so strong.
Let’s try running a linear model for the relationship between automatic vs. manual transmission and mpg, with and without controlling for weight. Also run a linear model for automatic vs. manual transmission and weight.
summary(lm(mpg ~ am,data=mtcars.transformed))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## amManual 7.244939 1.764422 4.106127 2.850207e-04
summary(lm(mpg ~ am + wt,data=mtcars.transformed))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
## amManual -0.02361522 1.5456453 -0.01527855 9.879146e-01
## wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
summary(lm(wt ~ am,data=mtcars.transformed))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.768895 0.1646171 22.894914 1.489921e-20
## amManual -1.357895 0.2582726 -5.257603 1.125440e-05
Without including weight in the model, automatic vs. manual transmission is very significantly associated with mpg, with manual transmission associated with a 7.24 mile per gallon increase in efficiency.
Controlling for weight, however, we no longer see an association between automatic vs. manual transmission and mpg. The p-value becomes non-significant, with a coefficient very close to 0.
This makes sense looking at the data. An increase in weight of 1,000 lbs is associated with a strong decrease in miles per gallon. Meanwhile, manual transmission has a significant negative correlation with weight.
The higher mpg of manual transmission cars is most likely due to the fact that they are lighter, and lighter cars are more efficient, rather than something inherent to the type of transmission.
We have already answered the question of whether or not transmission is associated with fuel efficiency.
As a supplementary look at this data, let’s now model mpg based solely on weight.
fit <- lm(mpg ~ wt,data=mtcars.transformed)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285126 1.877627 19.857575 8.241799e-19
## wt -5.344472 0.559101 -9.559044 1.293959e-10
confint(fit)
## 2.5 % 97.5 %
## (Intercept) 33.450500 41.119753
## wt -6.486308 -4.202635
We find that every 1,000 lb increase in car weight leads to a 5.34 mile per gallon decrease in efficiency. The 95% confidence interval is that every 1,000 lb increase in car weight leads to a 4.20 to 6.49 mile per gallon decrease in efficiency.
Plot residuals to check for outliers.
library(ggplot2);library(DBI);library(ggfortify)
autoplot(fit, label.size = 3)
We find that the Chrysler Imperial, Fiat 128, and Toyota Corolla are somewhat outliers. They have higher residuals, and are also relatively high leverage points (weight far from mean). Let’s make a simple scatterplot again highlighting these points, and also showing what the model would be like without them.
Even though they are high leverage and high residuals, the model does not look that different without these points. So let’s leave all points in the model.