In this report, we will explore the relationship between the variable MPG (miles per gallon) wit the automatic or manual transmission, and we will try to quantify the MPG difference between these two.
library (datasets)
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
It contains 32 observations of 11 variables.
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
mtcars$am.label <- factor(mtcars$am, labels=c("Automatic","Manual")) # 0=automatic, 1=manual
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.43 19.20 20.09 22.80 33.90
boxplot(mpg ~ am.label, data = mtcars, col = (c("green","blue")), ylab = "Miles Per Gallon", xlab= "Transmission Type")
As it can be seen in the boxplot that Manual transmission provides better MPG than Automatic. We will analyze this further in the remaining sections.
The mean MPG values for cars with Automatic and Manual transmission are:
aggregate(mpg~am, data = mtcars, mean)
## am mpg
## 1 0 17.14737
## 2 1 24.39231
We hypothesize that automatic cars have an MPG lower than manual cars. We determine if this is a significant difference with a t-test.
t.test(mtcars[mtcars$am.label == "Automatic",]$mpg, mtcars[mtcars$am.label == "Manual",]$mpg)
##
## Welch Two Sample t-test
##
## data: mtcars[mtcars$am.label == "Automatic", ]$mpg and mtcars[mtcars$am.label == "Manual", ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
As the p-value is 0.001374, we can state this is a significant difference.
MPG si the dependent variable and am is the independent variable to fit a linear regression.
lm.1 <- lm(mpg ~ am, data = mtcars)
summary(lm.1)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Although the p-value is less than 0.0003 (not reject the hypothesis), the adjusted R squared value is only 0.338 which means that only around 34% of the regression variance can be explained by our model. For that reason, a multivariate linear regression should be implemented.
The plot exploring the other variables to see how all they correlate with mpg is presented in the Appendix. Cyl, disp, hp, wt have the strongest correlation with mpg, so we are using them in the model.
multi <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
summary(multi)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5952 -1.5864 -0.7157 1.2821 5.5725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.20280 3.66910 10.412 9.08e-11 ***
## am1 1.55649 1.44054 1.080 0.28984
## cyl -1.10638 0.67636 -1.636 0.11393
## disp 0.01226 0.01171 1.047 0.30472
## hp -0.02796 0.01392 -2.008 0.05510 .
## wt -3.30262 1.13364 -2.913 0.00726 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared: 0.8551, Adjusted R-squared: 0.8273
## F-statistic: 30.7 on 5 and 26 DF, p-value: 4.029e-10
The model explains 83% of the variance so the variables cyl, disp, hp, wt did affect the correlation between mpg and am. We can say the difference between automatic and manual transmissions is 1.55 MPG.
The plot 2 in the Apendix shows the residuals.
pairs(mpg ~ ., data = mtcars)
### Plot 2
par(mfrow = c(2, 2))
plot(multi)