Motor Trend, a magazine about the automobile industry is interested in the relationship between a set of variables and miles per gallon (MPG) (outcome). They want answers to the following two questions which will be addressed in this report:
Is an automatic or manual transmission better for MPG? Quantify the MPG difference between automatic and manual transmissions?
By using simple and multivariate linear regression, this report shows that there is a difference between the mean mpg for automatic and manual transmissions. The manual cars are 7.245 mpg on average more economical than automatic ones.
library(knitr)
library(ggplot2)
mt <- mtcars
head(mt,10)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
cs <- sapply(mt,class)
cs
## mpg cyl disp hp drat wt qsec
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## vs am gear carb
## "numeric" "numeric" "numeric" "numeric"
mt$am <- as.factor(mt$am)
levels(mt$am) <- c("Automatic","Manual")
head(mt,6)
## mpg cyl disp hp drat wt qsec vs am gear
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 Automatic 3
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 Automatic 3
## carb
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 2
## Valiant 1
Now, the efect of the car’s transmission type on mpg is plotted in a box plot (figure 1 of Apendix):
aggregate(mpg~am, data = mt, mean)
## am mpg
## 1 Automatic 17.14737
## 2 Manual 24.39231
Using a correlation matrix, I will determine the predictors. You may check the full correlation plotted in figure 2 in Appendix.
mt1 <- sort(cor(mtcars)[1,])
mt1
## wt cyl disp hp carb qsec
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251 0.4186840
## gear am vs drat mpg
## 0.4802848 0.5998324 0.6640389 0.6811719 1.0000000
By these values and the plot of the figure 2 its clear that wt have the highest correlation (despite its signal) with MPG After this first analisys, I’ll begin my linear regression.
fit <- lm(mpg~am, data = mt)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
This is the hypothesis test for the model. From the the coefficient and intercepts, we see that automatic cars get 17.147 mpg while those with a manual transmission get 7.245 more miles per gallon. We also see that the R-Squared value is almost 0.36 which means the model only explains 36% of the variance.
Now, we will fit a multivariate linear regression for mpg on am with the additional predictors of wt and hp. We will analyze the variance with ANOVA to determine the differences.
mult <- lm(mpg~am + wt + hp, data = mt);
anova(fit, mult)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 180.29 2 540.61 41.979 3.745e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We will use 0.05 as the type I error rate significance benchmark. The second model has a very small p-value of 3.745e-09, so we can reject the null hypothesis and note the difference between the intial and the multivariate model.
As part of this work, I have to check if the residuals are nice for this analysis.The results were ploted in figure 3.
summary(mult)
##
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## amManual 2.083710 1.376420 1.514 0.141268
## wt -2.878575 0.904971 -3.181 0.003574 **
## hp -0.037479 0.009605 -3.902 0.000546 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
The second model explains 84% (rounded) of the variance as indicated by R-squared. We see that wt and hp do confound the relationship between am and mpg. The coefficient for am shows that the manual transmission cars have 2.084 (rounded) more mpg than the automatic.