This analysis seeks to answer the following two questions:
The questions are addressed via the application of linear regression over the mtcars1 dataset. The analysis concludes that after accounting for weight of the car, the dataset does not show any significant influence of transmission type on fuel economy. Hence the answers to the above questions, with reference to the fitted model are:
The following sections outline the analysis conducted to reach these conclusions.
In order to gain an appreciation for the variables present in the mtcars dataset, their pairwise relations are plotted as so:
library(GGally)
ggpairs(mtcars)
See appendix 1 for the resultant figure.
From the generated pairwise plots, we see that transmission (am) is correlated with fuel economy (mpg) in this dataset. Of all variables, weight (wt) shows the highest correlation with fuel economy. As described in 2, it is expected from physical principles that weight should be proportional to gallons-per-mile, so inversely proportional to miles-per-gallon. This is supported by the strong negative correlation between mpg and wt.
As a first step, the following simple model is fitted between the two variables of interest:
fit1 <- lm(mpg ~ am, mtcars)
summary(fit1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am 7.244939 1.764422 4.106127 2.850207e-04
The resulting model shows manual cars have average fuel economy of 17.15 mpg while automatic cars have average fuel economy of 24.39 mpg. This suggests automatic cars travel 7.24 miles further per gallon fuel. This difference in fuel economy is statistically significant (p = 0.0003). A full summary of the model is provided in appendix 2.
From the coursework we know that omitting a variable which is correlated with the included variables leads to bias in the fitted model. From the exploratory analysis we know that fuel economy is strongly correlated with weight, and that weight is correlated with transmisssion type. Has the omission of weight in the simple model led to a bias in the estimated coefficients? To address this possibilty, a model is fitted which includes weight as a regressor:
fit2 <- lm(mpg ~ wt + am, mtcars)
summary(fit2)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
## wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
## am -0.02361522 1.5456453 -0.01527855 9.879146e-01
The model including weight appears to fit the data much better (R-squared = 0.75 vs 0.36 for am-only model) Furthermore, now that weight is accounted for, the effect of transmission on fuel economy seems to have dissappeared. The am variable is given a small negative coefficient in the model (-0.02), which is easily explained by the null-hypothesis, that transmission type has no effect on fuel economy (p=0.99). Hence the anaylsis suggests that transmission type does not affect fuel economy.
To investigate the validity of the fitted model, the residuals are plotted:
par(mfrow = c(2, 2))
plot(fit2)
See appendix 4 for the resultant figure.
There does not appear to be any major issues apparent in the residual plots - there are no clear systematic patterns in the residuals v. fitted plot, and the Q-Q plot shows the residuals to be approximately normally distributed.
The plots do identify a few outlier points which are not well fitted by the model. The Chrysler Imperial is of particular concern due to relatively high leverage. Future work could focus on understanding and/or mitigating the effect of these outliers.
Overall the residual plots do not suggests any major issues with the chosen model.
The above analysis of the mtcars dataset suggests that after accounting for vehicle weight, transmission type shows no significant effect on fuel economy.
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
##
## Call:
## lm(formula = mpg ~ wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5295 -2.3619 -0.1317 1.4025 6.8782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155 3.05464 12.218 5.84e-13 ***
## wt -5.35281 0.78824 -6.791 1.87e-07 ***
## am -0.02362 1.54565 -0.015 0.988
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.098 on 29 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
## F-statistic: 44.17 on 2 and 29 DF, p-value: 1.579e-09