Regression Models Course Project

Author: Hannah Hon

Motor Trend is a magazine about the automobile industry. This report is about the relationship between a set of variables and miles per gallon (MPG) (outcome) to answer the two questions from Motor Trend Magazine. Here are the two questions:

1. Is an automatic or manual transmission better for MPG?

2. Quantify the MPG difference between automatic and manual transmissions.

data(mtcars)
library(ggplot2)
mtcars$am   <- factor(mtcars$am,labels=c("Automatic","Manual"))

Regression Model

## fit a linear model for the outcome mpg and variable transmission
fit <- lm(mpg ~ am - 1, mtcars)
confint(fit)
##                2.5 %   97.5 %
## amAutomatic 14.85062 19.44411
## amManual    21.61568 27.16894
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am - 1, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## amAutomatic   17.147      1.125   15.25 1.13e-15 ***
## amManual      24.392      1.360   17.94  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.9487, Adjusted R-squared:  0.9452 
## F-statistic: 277.2 on 2 and 30 DF,  p-value: < 2.2e-16

Summary1 :

We can see that the estimated mpg for automatic transmission is 17.147 and the estimated mpg for manual transmission is 24.392. The confidence interval for automatic transmission is 14.75 to 19.44. The confidence interval for manual transmission is 21.62 to 27.17. Hence, we can say that manual transmission is better for MPG. The r-squared is 0.9487, which means that the transmission type only explain around 94.87% of the variance in MPG. Let’s perform a variance analysis.

Multivariable Regression

fit2 <- aov(mpg ~ ., mtcars)
summary(fit2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cyl          1  817.7   817.7 116.425 5.03e-10 ***
## disp         1   37.6    37.6   5.353  0.03091 *  
## hp           1    9.4     9.4   1.334  0.26103    
## drat         1   16.5    16.5   2.345  0.14064    
## wt           1   77.5    77.5  11.031  0.00324 ** 
## qsec         1    3.9     3.9   0.562  0.46166    
## vs           1    0.1     0.1   0.018  0.89317    
## am           1   14.5    14.5   2.061  0.16586    
## gear         1    1.0     1.0   0.138  0.71365    
## carb         1    0.4     0.4   0.058  0.81218    
## Residuals   21  147.5     7.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the summary we are looking for p value less than 0.05, which are disp cyl and wt.

fit3 <- lm(mpg ~ + cyl + disp + wt + am -1, mtcars)
summary(fit3)
## 
## Call:
## lm(formula = mpg ~ +cyl + disp + wt + am - 1, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.318 -1.362 -0.479  1.354  6.059 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## cyl         -1.784173   0.618192  -2.886  0.00758 ** 
## disp         0.007404   0.012081   0.613  0.54509    
## wt          -3.583425   1.186504  -3.020  0.00547 ** 
## amAutomatic 40.898313   3.601540  11.356 8.68e-12 ***
## amManual    41.027379   3.008596  13.637 1.26e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.642 on 27 degrees of freedom
## Multiple R-squared:  0.9866, Adjusted R-squared:  0.9841 
## F-statistic:   397 on 5 and 27 DF,  p-value: < 2.2e-16

Summary2:

The r squared for the multivariable regression is 0.9866, which means that 98.66% of the variance can be explained by the multivariable regression model. The pvalue for cyl is less than 0.05, so cyl is confunding variables in the relationship between transmission and mpg.

Appendix

## boxplot for MPG according to different transmission type
g <- ggplot(aes(x = am, y = mpg), data = mtcars)
g <- g + geom_boxplot(aes(fill = am), col = "blue")      
g <- g + xlab("Transmission") + ylab("MPG") + labs(title = "MPG on Transmission Type") 
g

From the boxplot we can wee that manual transmission has overall higher MPG level compared to automatic transmission type.

## Residual plot for the multivariable regression model
par(mfrow = c(2, 2))
plot(fit3)

## Scatterplot matrix for the data
pairs(mpg ~., mtcars)