This project analyzes the Motor Trend Car Road Tests data in the R datasets package to investigate if automatic or manual transmissions are better for MPG. Also this project quantifies the MPG difference between automatic and manual transmissions.
As Summary of Data section in Appendix indicates, this project uses a data frame with 32 observations of 11 variables.
As Figure-1 in Appendix indicates, it appears that the cars with automatic transmissions have lower MPG than the cars with manual transmissions.
Now, observe the fit of the simple linear regression by using MPG as the dependent variable and transmission type as an independent variable.
summary(lm(mpg ~ am, data=mtcars))$adj.r.squared
## [1] 0.3384589
From the result above, the adjusted R-squared is 0.338 and this indicates this model only shows 33.8% of the variance. Therefore, this project will use a multivariable linear regression.
From the results of Variable Correlation with MPG section in Appendix , the variables wt, cyl, disp, and hp have a strong correlation with mpg. These variables will be used to build the multivariable linear regression in the following sections.
From the results of Model Comparison section in Appendix, wt, hp, cyl, and disp affect the correlation for mpg and am. The results also show the adjusted R-squared is 0.827 and this indicates this model shows 82.7% of the variance.
data(mtcars)
head(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
library(ggplot2)
mtcars_mp_transmission <- mtcars["mpg"]
mtcars_mp_transmission$transmission<-factor(mtcars$am, labels = c('automatic', 'manual'))
g <- ggplot(mtcars_mp_transmission, aes(x = transmission, y = mpg, colour = transmission))
g <- g + geom_boxplot()
g
aggregate(mpg~transmission, data = mtcars_mp_transmission, median)
library(car)
## Loading required package: carData
sort(vif(lm(mpg ~ . , data = mtcars)), decreasing = TRUE)
## disp cyl wt hp carb qsec gear
## 21.620241 15.373833 15.164887 9.832037 7.908747 7.527958 5.357452
## vs am drat
## 4.965873 4.648487 3.374620
sort(round(cor(mtcars), 3)["mpg",])
## wt cyl disp hp carb qsec gear am vs drat
## -0.868 -0.852 -0.848 -0.776 -0.551 0.419 0.480 0.600 0.664 0.681
## mpg
## 1.000
lm_summary <- summary(lm(mpg ~ am + wt + cyl + disp + hp, data = mtcars))
sort(lm_summary $coefficients[-1,4])
## wt hp cyl am disp
## 0.007256888 0.055096587 0.113932156 0.289843011 0.304719404
lm_summary$adj.r.squared
## [1] 0.8272816
lm_summary <- summary(lm(mpg ~ am + wt + cyl + hp, data = mtcars))
sort(lm_summary $coefficients[-1,4])
## wt hp cyl am
## 0.008603218 0.078553374 0.211916611 0.314179886
lm_summary$adj.r.squared
## [1] 0.8266657
summary(lm(mpg ~ am, data = mtcars))$coefficients["am", "Estimate"]
## [1] 7.244939
summary(lm(mpg ~ am + wt + cyl + disp + hp, data = mtcars))$coefficients["am", "Estimate"]
## [1] 1.556492
par(mfrow = c(2, 2))
plot(lm(mpg ~ am + wt + cyl + disp + hp, data = mtcars))