## Warning: package 'ggplot2' was built under R version 3.2.2

Executive Summary

From the data set of a colleciton of cars(“mtcars” data set in R), the relationship between automatic and manual transmission is analyzed. From the exploratory analysis, it is clear that the average MPG of manual transmission is much higher than that of automatic transmission. The best fitted model is created using step() function. According to the model, the manual transmission has 2.9358 higher MPG than automatic transmission.

Is an automatic or manual transmission better for MPG?

The Figure 1 shows that the manual transmission has higher MPG. The quantified difference between the automatic and manual transmission is explained on the next section.

Quantify the MPG difference between automatic and manual transmissions.

fit1 <- lm(mpg ~ am-1, data = mtcars)
summary(fit1)$coef
##     Estimate Std. Error  t value     Pr(>|t|)
## am0 17.14737   1.124603 15.24749 1.133983e-15
## am1 24.39231   1.359578 17.94109 1.376283e-17

The average MPG of automatic is 17.147 and the the average MPG of manual is 24.392.

Compare different models to select the best model to see how the type of transmissions affects the MPG. First, the model with one variable (i.e., “am”) is created.

fit2 <- lm(mpg ~ am, data = mtcars)
summary(fit2)$r.squared
## [1] 0.3597989

The model has R-squared value of 0.3598, which is low.

The model with all variables in the dataset is created.

fit3 <- lm(mpg ~ ., data = mtcars)
summary(fit3)$r.squared
## [1] 0.8816005

The models has R-squared value of 0.8816. However, since the model includes all variables, there may be risk of overfitting.

Use stepwise model to selected the best model using step() function.

summary(step(lm(mpg ~ ., data = mtcars), direction = "both", trace = 0))$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am1          2.935837  1.4109045  2.080819 4.671551e-02

By using the step()function, the 3 most significant variables were selected. A model with the 3 variables is created as below.

fit4 <- lm(mpg ~ wt + qsec + am, data = mtcars)
summary(fit4)[c(4,8)]
## $coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am1          2.935837  1.4109045  2.080819 4.671551e-02
## 
## $r.squared
## [1] 0.8496636

The model with 3 variables has R-squred value of 0.8497. All p-values are less than 0.05. Every 1/1000lb weight increase will decrease MPG by -3.9165; every 1/4 mile time increase will increase MPG by 1.2259; the manual transmission has 2.9358 higher MPG than automatic transmission.

The 95% confidence interval for the model is as below.

confint(fit4)
##                   2.5 %    97.5 %
## (Intercept) -4.63829946 23.873860
## wt          -5.37333423 -2.459673
## qsec         0.63457320  1.817199
## am1          0.04573031  5.825944

As shown in Figure 2. the residual vs fitted, normal Q-Q, scale-location, and residual vs leverage plot do not show distinctive patterns.

Appendix

Figure 1. Boxplot of automatic / manual transmission and MPG

Figure 2. Residual Diagnostics