Executive Summary

This report will use regression analysis to answer two questions on the mtcars dataset in R.

The analysis will show that automatic transmission consumes less miles per gallon than manual transmission. We also do a residual plot and some diagnostics to show how the model fit.

Exploratory Data Analyses

First, we will do boxplot to compare miles per gallons between automatic and manual transmissions.

From Figure1 in the appendix, we can see that automatic transmission tend to has less mpg than manual transmission. We will investigate this assumption more in the next section.

Regression Analysis and Model Selection

Is an automatic or manual transmission better for MPG?

As we can see in Figure 1 from the appendix. It seems that transmissions have significant correlation with mpg. We can prove it by single variable linear regression.

fit1 <- lm(mpg ~ factor(am), data = mtcars)
summary(fit1)$coeff
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## factor(am)1  7.244939   1.764422  4.106127 2.850207e-04
paste("R-squared =", summary(fit1)$r.squared)
## [1] "R-squared = 0.359798943425465"

As the Pr(>|t|) for am=1 is much less than 0.05, we reject the null hypothesis that \(\beta_{am=1}\) = 0. Thus “am” is a significant regressor for mpg. The regression for mpg is

paste("mpg =",round(fit1$coefficients[1],3), "+", round(fit1$coefficients[2],3),"* am")
## [1] "mpg = 17.147 + 7.245 * am"

We can show that 95% confident interval for \(\beta_{am=1}\) are greater than zero.

confint(fit1)
##                2.5 %   97.5 %
## (Intercept) 14.85062 19.44411
## factor(am)1  3.64151 10.84837

So at mean value of \(\beta_{am=1}\) and 95% confident interval > 0, Then \(mpg_{am=1} > mpg_{am=0}\) Thus automatic transmission is better for mpg. However, this model has low R-squared.

Quantify the MPG difference between automatic and manual transmissions

We need to find more accurate model in order to precisely quantify the mpg difference between automatic and manual transmissions. We will use step function to find the accurate model. This step function begins with including all variables then use backward direction to eliminate one by one unnecessary variables.

full <- lm(mpg ~ ., data = mtcars)
best <- step(full, direction="backward",trace = 0)
paste("best model is", best$call[2])
## [1] "best model is mpg ~ wt + qsec + am"
fit2 <- lm(formula = mpg ~ wt + qsec + factor(am), data = mtcars)
summary(fit2)$coeff
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## factor(am)1  2.935837  1.4109045  2.080819 4.671551e-02
paste("R-squared =", summary(fit2)$r.squared)
## [1] "R-squared = 0.849663556361707"

We can see that Pr(>|t|) for wt, qsec, am are all less than 0.05. So wt, qsec, am are significant regressors for mpg. We can see from the R-squared that 84.97 percent of total variation described by this regression model.

The final regression for mpg is

paste("mpg =",round(fit2$coefficients[1],3),"+", round(fit2$coefficients[2],3),
"* wt +",round(fit2$coefficients[3],3),"* qsec +",round(fit2$coefficients[4],3),"* am")
## [1] "mpg = 9.618 + -3.917 * wt + 1.226 * qsec + 2.936 * am"

We can show that 95% confident interval for \(\beta_{am=1}\) are greater than zero.

paste("95% confident interval for beta(am) is ",round(confint(fit2)[4],3), " to ", round(confint(fit2)[8],3))
## [1] "95% confident interval for beta(am) is  0.046  to  5.826"

So at mean value of \(\beta_{am=1}\) and 95% confident interval > 0, Automatic transmission consumes less 2.936 miles per gallon than manual transmission when holding wt and qsec as constant.

For example

y <- predict(fit2,newdata = data.frame(wt=c(1,1), qsec=c(1,1), am=c(1,0)))
y
##        1        2 
## 9.863000 6.927163

We can see that wt and qsec are constant 1 for both when am=1 and am=0, Then \(mpg_{am=1} - mpg_{am=0} =\) 9.863 - 6.927 = 2.936 \(= \beta_{am=1}\)

Residual Plot And Diagnostics

In this section we will show how well our model fit. Please see residuals plots from Figure 2 in the appendix.

For Residuals vs Fitted, It looks like a good model because residuals equally spread around a horizontal line without distinct patterns. Although the plot suggests some outliers which are Chrysler Imperial, Fiat 128 and Toyota Corolla.

For Normal Q-Q, Residuals seem to be on the Theoretical Quantiles line except some outliers. So residuals seem to be normally distributed.

For Residuals vs Leverage, The plot shows that no outliers are outside of the Cook’s distance (dash line). So these outliers are not influential to the regression results.

Conclusions

Appendix

library(ggplot2)
my_mtcars <- mtcars
my_mtcars$transmissions <- factor(mtcars$am)
levels(my_mtcars$transmissions) <- c("automatic","manual")
g <- ggplot(data = my_mtcars, aes(y=mpg, x=transmissions, fill = transmissions)) + geom_boxplot()
g <- g + labs(title = "Figure 1: MPG between Automatic and Manual Transmission ")
g

par(mfrow=c(2,2),cex=1.4) 
plot(fit2,which = 1,caption = ""); title(main="Residuals vs Fitted",font.main=1)
plot(fit2,which = 2,caption = ""); title(main="Normal Q-Q",font.main=1)
plot(fit2,which = 5,caption = ""); title(main="Residuals vs Leverage",font.main=1)
mtext("Figure 2: Residuals Plots", side = 3, line = -2, outer = TRUE, cex = 1.7)