Regression Model Project

Prepared by: Bernard Kiyanda, April 2015

Executive summary

Looking at a data set of a collection of cars, we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome), more particularly in the following two questions:
Is an automatic or manual transmission better for MPG
Quantify the MPG difference between automatic and manual transmissions

Using a simple linear regression, we determined that there is a signficant difference between the mean MPG for automatic and manual transmission cars, with the manual cars having 7.245 more MPGs on average.

Exploratory data analyses

First look at how miles per gallon perform for each transmission type (0 = automatic, 1 = manual) in the APPENDIX. As expected, manual transmission seems to get better miles per gallon than automatic transmission. The mean for each transmission type is shown below:

aggregate(mpg~am, data = mtcars, mean)
##   am      mpg
## 1  0 17.14737
## 2  1 24.39231

Correlation of the Transmission Regressor

Let’s determine if the “Transmission” regressor (indicating either automatic or manual) is correlated to other variables in the dataset.

library(car); fit <- lm(mpg ~ . , data = mtcars); vif(fit)
## Warning: package 'car' was built under R version 3.1.3
##       cyl      disp        hp      drat        wt      qsec        vs 
## 15.373833 21.620241  9.832037  3.374620 15.164887  7.527958  4.965873 
##        am      gear      carb 
##  4.648487  5.357452  7.908747

Here, the variance inflation is high for cyl, disp, hp, wt, carb and qsec, thus indicating more correlation between the these regressors.

Model 1: Simple linear regression model

The linear model plot shown in the APPENDIX of mpg versus Transmission (0 = automatic, 1 = manual), and the coefficients interpretation below indicate that there is higher mpg for cars with manual transmissions.

Coefficients interpretation - The coefficients show a positive slope=7.245, indicating an increase of mpg when the transmission predictor changes from automatic to manual.

lm1 <- lm(mtcars$mpg ~ mtcars$am)
summary(lm1)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## mtcars$am    7.244939   1.764422  4.106127 2.850207e-04

Thus the slope coefficient for the linear model can be further visualized by observing the mean mpg increasing from automatic and manual transmission.

Residuals - We now investigate the residuals of the observed values of the variable Transmission (am). You can see in the residual plot in the APPENDIX that the error variance, being the distance between from the regression line and the data point, is greater for the case am=1 i.e. manual transmissions. Therefore the predictive factor is not as reliable for manual transmissions as it is for automatic transmissions.

summary(lm(mpg~am, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Interpreting the coefficient and intercepts, we say that, on average, automatic cars have 17.147 MPG and manual transmission cars have 7.245 MPGs more. In addition, we see that the R2 value is 0.3598. This means that our model only explains 35.98% of the variance

Model 2: Multivariate Linear Model

Next, we fit a multivariate linear regression for mpg on am, wt, and hp. With a p-value of 2.908e-11 below, we reject the null hypothesis and claim that our multivariate model is significantly different from our simple model.

bestfit <- lm(mpg~am + wt + hp, data = mtcars)
summary(bestfit)
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## am           2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

This model explains over 83.99% of the variance. Moreover, we see that wt and hp did indeed confound the relationship between am and mpg (mostly wt). Now when we read the coefficient for am, we say that, on average, manual transmission cars have 2.08 MPGs more than automatic cars.

Appendix

Exploratory plot

boxplot(mpg~am, data = mtcars,
        xlab = "Transmission",
        ylab = "Miles per Gallon",
        main = "MPG by Transmission Type")

Simple linear regression model plot - Model 1

lm1 <- lm(mtcars$mpg ~ mtcars$am)
plot(mtcars$am,mtcars$mpg,pch=19,col="blue")
lines(mtcars$am,lm1$fitted,lwd=3,col="darkgrey")

Residual plot for the simple linear model 1

lm1.res = resid(lm1)
plot(mtcars$am, lm1.res, 
 ylab="Residuals", xlab="mtcars$am Transmission", 
 main="Miles per gallon") 
abline(0, 0)