There was an 11.6% increase in mileage (mpg) for cars with manual transmission compared to those with automatic transmission (p = 0.009; t value = 2.83, 95%CI 3.17 to 19.98), while holding weight (wt) and number of cylinders(cyl) constant. There was a 2.4% decrease in mileage for cars with automatic transmission for every 1 unit increase in weight (wt), while holding the number of cylinders and type of transmission constant. There is 2.7% decrease in mileage for cars with 6 cylinders and 4.8 % decrease for cars with 8 cylinders compared to 4 cylinder cars while holding weight and type of transmission constant. There was a significant p value (0.007) for the interaction between weight and transmission suggesting that relationship between miles per gallon and weight varies by type of transmission.
For this analysis we will be using the dataset mtcars, which is included with every standard installation of R. The data comprises fuel consumption and 10 aspects (number of cylinders (cyl), engine displacement (mpg), gross horsepower (hp), rear axle ratio (drat), weight (wt), quarter mile time (qsec), type of transmission (am), number of forward gears (drat), and number of carburetors (carb)) of automobile design and performance for 32 automobiles (1973-74 models). We will be zeroing particularly on the problem: “Which type of transmission (automatic or manual) produces better mileage (more miles per gallon, MPG)”. Below is a comparison of the distribution and mean (represented by the green lines) miles per gallon of cars with automatic and manual transmission.
| Shapiro.Wilk.normality.test | Automatic | Manual |
|---|---|---|
| statistic.w | 0.98 | 0.95 |
| p.value | 0.89 | 0.54 |
Assuming normality of our data (shapiro.wilk’s normality test of 0.9 and 0.54) and that random sampling was performed, the difference in the average mileage between cars with automatic and manual transmission is significant with a p-value of 0.0014 (t-stat= 3.77, 95% CI = -11.280194 -3.209684).
| t | deg.f | p.val | low.CI | upp.CI | auto | manual | |
|---|---|---|---|---|---|---|---|
| t.test_mpg~am | -3.77 | 18.33 | 0 | -11.28 | -3.21 | 17.15 | 24.39 |
Among the continuous variables, the most correlated to mileage is weight (wt) and we use that as our initial predictor together with the type of transmission (factor variable am).
| mpg | disp | hp | drat | wt | qsec | |
|---|---|---|---|---|---|---|
| mpg | 1 | -0.8475514 | -0.7761684 | 0.6811719 | -0.8676594 | 0.418684 |
Fitting the other variables in our models resulted in the model mpg ~ wt factor(am) + factor(cyl)* with the best fit (R squared value = 0.877 ) while maintaining a significant p value in all the coefficients and a confidence interval that does not include 0.
| Estimate | Std.Error | t.value | P.Value | 2.5 % | 97.5 % | |
|---|---|---|---|---|---|---|
| (Intercept) | 29.775 | 10.483 | 10.482836 | 0.000 | 23.936 | 35.613 |
| wt | -2.399 | -2.842 | -2.842116 | 0.009 | -4.134 | -0.664 |
| factor(am)1 | 11.569 | 2.830 | 2.830083 | 0.009 | 3.166 | 19.971 |
| factor(cyl)6 | -2.710 | -1.996 | -1.996371 | 0.056 | -5.500 | 0.080 |
| factor(cyl)8 | -4.776 | -3.070 | -3.069814 | 0.005 | -7.974 | -1.578 |
| wt:factor(am)1 | -4.068 | -2.911 | -2.911075 | 0.007 | -6.940 | -1.196 |
The plots referred to in this section may be viewed in the Appendix section The sum of our residuals is -3.330669110^{-16}. The points on the plot of the Residuals vs. Fitted values are randomly scattered and the non-constant Variance Score Test is not significant (0.113) which suggest that the error variance does not changes with the level of the fitted values (test for heteroscedasticity).
The Q-Q plot and the shapiro.test for normality show a normal distribution of the residuals (p = 0.103). The Scale-Location plot and the Reidual vs Leverage plot identified 3 points of interest which depart from the cluster of data points. We further examine these datapoints for there influence in our model using the function influence. measure.
Using the outliertest function, we have identified Fiat 128 as an outlier.
In the interest of reproducible research, codes for this analysis are available upon request
library(ggplot2)
g <- ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(am)))
g <- g + geom_point(size = 6, colour = "black") + geom_point(size = 4)
g <- g + xlab("% in weight") + ylab("mpg")
g
1 line, 1 intercept, 1 slope
fitwt <- lm(mpg ~ wt, data = mtcars)
g1 <- g
g1 <- g1 + geom_abline(intercept = coef(fitwt)[1], slope = coef(fitwt)[2], size = 2)
g1
model1 <- lm(mpg ~ wt, data = mtcars)
kable(summary(model1)$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 37.285126 | 1.877627 | 19.857575 | 0 |
| wt | -5.344472 | 0.559101 | -9.559044 | 0 |
summary(model1)$r.squared
[1] 0.7528328
summary(model1)$adj.r.squared
[1] 0.7445939
2 lines, 2 intercepts, 1 slope the lines are very close to each other
fitwt_am <- lm(mpg ~ wt + factor(am), data = mtcars)
g2 <- g
g2 <- g2 + geom_abline(intercept = coef(fitwt_am)[1], slope = coef(fitwt_am)[2], size = 1, col = "blue")
g2 <- g2 + geom_abline(intercept = coef(fitwt_am)[1] + coef(fitwt_am)[3], slope = coef(fitwt_am)[2], size = 1, col = "red")
g2
model2 <- lm(mpg ~ wt + factor(am), data = mtcars)
kable(summary(model2)$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 37.3215513 | 3.0546385 | 12.2179928 | 0.0000000 |
| wt | -5.3528114 | 0.7882438 | -6.7908072 | 0.0000002 |
| factor(am)1 | -0.0236152 | 1.5456453 | -0.0152786 | 0.9879146 |
summary(model2)$r.squared
[1] 0.7528348
summary(model2)$adj.r.squared
[1] 0.7357889
2 lines, 2 intercepts, 2 slopes. Interaction between weight and transmission
fitwt_Iam <- lm(mpg ~ wt * factor(am), data = mtcars)
g3 <- g
g3 <- g3 + geom_abline(intercept = coef(fitwt_Iam)[1], slope = coef(fitwt_Iam)[2], size = 2, col = "red")
g3 <- g3 + geom_abline(intercept = coef(fitwt_Iam)[1] + coef(fitwt_Iam)[3], slope = coef(fitwt_Iam)[2] + coef(fitwt_Iam)[4], size = 2, col = "blue")
g3
model3 <- lm(mpg ~ wt * factor(am), data = mtcars)
kable(summary(model3)$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 31.416055 | 3.0201093 | 10.402291 | 0.0000000 |
| wt | -3.785907 | 0.7856478 | -4.818836 | 0.0000455 |
| factor(am)1 | 14.878422 | 4.2640422 | 3.489276 | 0.0016210 |
| wt:factor(am)1 | -5.298361 | 1.4446993 | -3.667449 | 0.0010171 |
summary(model3)$r.squared
[1] 0.8330375
summary(model3)$adj.r.squared
[1] 0.8151486
2 lines, 2 intercepts, 2 slopes. Interaction between weight and transmission and adjustments for number of cylinders.
fitwt_Iamcyl <- lm(mpg ~ wt * factor(am) + factor(cyl), data = mtcars)
g4 <- g
g4 <- g4 + geom_abline(intercept = coef(fitwt_Iamcyl )[1], slope = coef(fitwt_Iamcyl )[2], size = 2, col = "red")
g4 <- g4 + geom_abline(intercept = coef(fitwt_Iamcyl)[1] + coef(fitwt_Iamcyl)[3], slope = coef(fitwt_Iamcyl)[2] + coef(fitwt_Iamcyl)[6], size = 2, col = "blue")
g4
model4 <- lm(mpg ~ wt * factor(am) + factor(cyl), data = mtcars)
kable(summary(model4)$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 29.774836 | 2.8403415 | 10.482836 | 0.0000000 |
| wt | -2.398713 | 0.8439884 | -2.842116 | 0.0086039 |
| factor(am)1 | 11.568790 | 4.0877912 | 2.830083 | 0.0088538 |
| factor(cyl)6 | -2.709777 | 1.3573517 | -1.996371 | 0.0564651 |
| factor(cyl)8 | -4.776110 | 1.5558306 | -3.069814 | 0.0049646 |
| wt:factor(am)1 | -4.067981 | 1.3974151 | -2.911075 | 0.0072955 |
summary(model4)$r.squared
[1] 0.8774548
summary(model4)$adj.r.squared
[1] 0.8538884
model_comparison <- anova(model1, model2, model3, model4)
print(kable(model_comparison))
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
|---|---|---|---|---|---|
| 30 | 278.3219 | NA | NA | NA | NA |
| 29 | 278.3197 | 1 | 0.0022403 | 0.0004221 | 0.9837651 |
| 28 | 188.0077 | 1 | 90.3120314 | 17.0163295 | 0.0003372 |
| 26 | 137.9917 | 2 | 50.0159319 | 4.7119280 | 0.0179394 |
kable(round(sqrt(vif(model4)), digits = 1))
| GVIF | Df | GVIF^(1/(2*Df)) | |
|---|---|---|---|
| wt | 2.0 | 1.0 | 1.4 |
| factor(am) | 4.9 | 1.0 | 2.2 |
| factor(cyl) | 1.7 | 1.4 | 1.2 |
| wt:factor(am) | 4.3 | 1.0 | 2.1 |
par(mfrow=c(2,2))
plot(fitwt_Iamcyl)