## Warning: package 'ggplot2' was built under R version 3.2.2
From the data set of a colleciton of cars(“mtcars” data set in R), the relationship between automatic and manual transmission is analyzed. From the exploratory analysis, it is clear that the average MPG of manual transmission is much higher than that of automatic transmission. The best fitted model is created using step() function. According to the model, the manual transmission has 2.9358 higher MPG than automatic transmission.
The Figure 1 shows that the manual transmission has higher MPG. The quantified difference between the automatic and manual transmission is explained on the next section.
fit1 <- lm(mpg ~ am-1, data = mtcars)
summary(fit1)$coef
## Estimate Std. Error t value Pr(>|t|)
## am0 17.14737 1.124603 15.24749 1.133983e-15
## am1 24.39231 1.359578 17.94109 1.376283e-17
The average MPG of automatic is 17.147 and the the average MPG of manual is 24.392.
Compare different models to select the best model to see how the type of transmissions affects the MPG. First, the model with one variable (i.e., “am”) is created.
fit2 <- lm(mpg ~ am, data = mtcars)
summary(fit2)$r.squared
## [1] 0.3597989
The model has R-squared value of 0.3598, which is low.
The model with all variables in the dataset is created.
fit3 <- lm(mpg ~ ., data = mtcars)
summary(fit3)$r.squared
## [1] 0.8816005
The models has R-squared value of 0.8816. However, since the model includes all variables, there may be risk of overfitting.
Use stepwise model to selected the best model using step() function.
summary(step(lm(mpg ~ ., data = mtcars), direction = "both", trace = 0))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## am1 2.935837 1.4109045 2.080819 4.671551e-02
By using the step()function, the 3 most significant variables were selected. A model with the 3 variables is created as below.
fit4 <- lm(mpg ~ wt + qsec + am, data = mtcars)
summary(fit4)[c(4,8)]
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## am1 2.935837 1.4109045 2.080819 4.671551e-02
##
## $r.squared
## [1] 0.8496636
The model with 3 variables has R-squred value of 0.8497. All p-values are less than 0.05. Every 1/1000lb weight increase will decrease MPG by -3.9165; every 1/4 mile time increase will increase MPG by 1.2259; the manual transmission has 2.9358 higher MPG than automatic transmission.
The 95% confidence interval for the model is as below.
confint(fit4)
## 2.5 % 97.5 %
## (Intercept) -4.63829946 23.873860
## wt -5.37333423 -2.459673
## qsec 0.63457320 1.817199
## am1 0.04573031 5.825944
As shown in Figure 2. the residual vs fitted, normal Q-Q, scale-location, and residual vs leverage plot do not show distinctive patterns.
Figure 1. Boxplot of automatic / manual transmission and MPG
Figure 2. Residual Diagnostics