This project aims at answering the two questions:
We conclude that:
We begin by inspecting the distribution of mpg for different types of transmission, please find figure 1 in appendix.
We also inspect the difference of mean and standard deviation for mpg with different transmission types. As shown below, we can see the mean and standard deviation for mpg with different transmission types are significantly different.
data(mtcars)
summary(mtcars$mpg[mtcars$am==0])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.4 15.0 17.3 17.1 19.2 24.4
sd(mtcars$mpg[mtcars$am==0])
## [1] 3.834
summary(mtcars$mpg[mtcars$am==1])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.0 21.0 22.8 24.4 30.4 33.9
sd(mtcars$mpg[mtcars$am==1])
## [1] 6.167
Here we choose models by AIC in a stepwise algorithm.
For transmission type 0 (atomatic), we get the model.
lm1 <- step(lm(mpg ~ ., data=mtcars))
Residual standard error for this model is:
summary(lm1)$sigma
## [1] 2.459
The coefficients suggest that, compared with automatic transmissions, manual transmittions have an additional 2.94mpg.
To identify whether wt and qsec have significant influences to the model. We build models with and without wt and qsec respectively. We, then, use residual standard error to measure the performance of different models
The second model is:
lm2 <- lm(mpg ~ am + wt, data=mtcars)
Residual standard error for this model is:
summary(lm2)$sigma
## [1] 3.098
The third model is:
lm3 <- lm(mpg ~ am + qsec, data=mtcars)
Residual standard error for this model is:
summary(lm3)$sigma
## [1] 3.487
The fourth model is:
lm4 <- lm(mpg ~ am, data=mtcars)
Residual standard error for this model is:
summary(lm4)$sigma
## [1] 4.902
As we can see, the first model (lm(formula = mpg ~ wt + qsec + am, data = mtcars)) has the smallest residual standard error.
We visualize residuals in figure 2 of appendix, from these figures, we can see, there is no significant relationship between residuals and fitted values.
boxplot(mtcars$mpg ~ mtcars$am,main='Figure 1. Box plot of mpg for different transmission', xlab='Transmission', ylab='mpg')
par(mfrow = c(2, 2))
plot(lm1)