Looking at a data set of a collection of cars, we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome), more particularly in the following two questions:
Is an automatic or manual transmission better for MPG
Quantify the MPG difference between automatic and manual transmissions
Using a simple linear regression, we determined that there is a signficant difference between the mean MPG for automatic and manual transmission cars, with the manual cars having 7.245 more MPGs on average.
First look at how miles per gallon perform for each transmission type (0 = automatic, 1 = manual) in the APPENDIX. As expected, manual transmission seems to get better miles per gallon than automatic transmission. The mean for each transmission type is shown below:
aggregate(mpg~am, data = mtcars, mean)
## am mpg
## 1 0 17.14737
## 2 1 24.39231
Let’s determine if the “Transmission” regressor (indicating either automatic or manual) is correlated to other variables in the dataset.
library(car); fit <- lm(mpg ~ . , data = mtcars); vif(fit)
## Warning: package 'car' was built under R version 3.1.3
## cyl disp hp drat wt qsec vs
## 15.373833 21.620241 9.832037 3.374620 15.164887 7.527958 4.965873
## am gear carb
## 4.648487 5.357452 7.908747
Here, the variance inflation is high for cyl, disp, hp, wt, carb and qsec, thus indicating more correlation between the these regressors.
The linear model plot shown in the APPENDIX of mpg versus Transmission (0 = automatic, 1 = manual), and the coefficients interpretation below indicate that there is higher mpg for cars with manual transmissions.
Coefficients interpretation - The coefficients show a positive slope=7.245, indicating an increase of mpg when the transmission predictor changes from automatic to manual.
lm1 <- lm(mtcars$mpg ~ mtcars$am)
summary(lm1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## mtcars$am 7.244939 1.764422 4.106127 2.850207e-04
Thus the slope coefficient for the linear model can be further visualized by observing the mean mpg increasing from automatic and manual transmission.
Residuals - We now investigate the residuals of the observed values of the variable Transmission (am). You can see in the residual plot in the APPENDIX that the error variance, being the distance between from the regression line and the data point, is greater for the case am=1 i.e. manual transmissions. Therefore the predictive factor is not as reliable for manual transmissions as it is for automatic transmissions.
summary(lm(mpg~am, data = mtcars))
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Interpreting the coefficient and intercepts, we say that, on average, automatic cars have 17.147 MPG and manual transmission cars have 7.245 MPGs more. In addition, we see that the R2 value is 0.3598. This means that our model only explains 35.98% of the variance
Next, we fit a multivariate linear regression for mpg on am, wt, and hp. With a p-value of 2.908e-11 below, we reject the null hypothesis and claim that our multivariate model is significantly different from our simple model.
bestfit <- lm(mpg~am + wt + hp, data = mtcars)
summary(bestfit)
##
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## am 2.083710 1.376420 1.514 0.141268
## wt -2.878575 0.904971 -3.181 0.003574 **
## hp -0.037479 0.009605 -3.902 0.000546 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
This model explains over 83.99% of the variance. Moreover, we see that wt and hp did indeed confound the relationship between am and mpg (mostly wt). Now when we read the coefficient for am, we say that, on average, manual transmission cars have 2.08 MPGs more than automatic cars.
boxplot(mpg~am, data = mtcars,
xlab = "Transmission",
ylab = "Miles per Gallon",
main = "MPG by Transmission Type")
lm1 <- lm(mtcars$mpg ~ mtcars$am)
plot(mtcars$am,mtcars$mpg,pch=19,col="blue")
lines(mtcars$am,lm1$fitted,lwd=3,col="darkgrey")
lm1.res = resid(lm1)
plot(mtcars$am, lm1.res,
ylab="Residuals", xlab="mtcars$am Transmission",
main="Miles per gallon")
abline(0, 0)