Author: Hannah Hon
Motor Trend is a magazine about the automobile industry. This report is about the relationship between a set of variables and miles per gallon (MPG) (outcome) to answer the two questions from Motor Trend Magazine. Here are the two questions:
data(mtcars)
library(ggplot2)
mtcars$am <- factor(mtcars$am,labels=c("Automatic","Manual"))
## fit a linear model for the outcome mpg and variable transmission
fit <- lm(mpg ~ am - 1, mtcars)
confint(fit)
## 2.5 % 97.5 %
## amAutomatic 14.85062 19.44411
## amManual 21.61568 27.16894
summary(fit)
##
## Call:
## lm(formula = mpg ~ am - 1, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## amAutomatic 17.147 1.125 15.25 1.13e-15 ***
## amManual 24.392 1.360 17.94 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9452
## F-statistic: 277.2 on 2 and 30 DF, p-value: < 2.2e-16
Summary1 :
We can see that the estimated mpg for automatic transmission is 17.147 and the estimated mpg for manual transmission is 24.392. The confidence interval for automatic transmission is 14.75 to 19.44. The confidence interval for manual transmission is 21.62 to 27.17. Hence, we can say that manual transmission is better for MPG. The r-squared is 0.9487, which means that the transmission type only explain around 94.87% of the variance in MPG. Let’s perform a variance analysis.
fit2 <- aov(mpg ~ ., mtcars)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 1 817.7 817.7 116.425 5.03e-10 ***
## disp 1 37.6 37.6 5.353 0.03091 *
## hp 1 9.4 9.4 1.334 0.26103
## drat 1 16.5 16.5 2.345 0.14064
## wt 1 77.5 77.5 11.031 0.00324 **
## qsec 1 3.9 3.9 0.562 0.46166
## vs 1 0.1 0.1 0.018 0.89317
## am 1 14.5 14.5 2.061 0.16586
## gear 1 1.0 1.0 0.138 0.71365
## carb 1 0.4 0.4 0.058 0.81218
## Residuals 21 147.5 7.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the summary we are looking for p value less than 0.05, which are disp cyl and wt.
fit3 <- lm(mpg ~ + cyl + disp + wt + am -1, mtcars)
summary(fit3)
##
## Call:
## lm(formula = mpg ~ +cyl + disp + wt + am - 1, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.318 -1.362 -0.479 1.354 6.059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## cyl -1.784173 0.618192 -2.886 0.00758 **
## disp 0.007404 0.012081 0.613 0.54509
## wt -3.583425 1.186504 -3.020 0.00547 **
## amAutomatic 40.898313 3.601540 11.356 8.68e-12 ***
## amManual 41.027379 3.008596 13.637 1.26e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.642 on 27 degrees of freedom
## Multiple R-squared: 0.9866, Adjusted R-squared: 0.9841
## F-statistic: 397 on 5 and 27 DF, p-value: < 2.2e-16
Summary2:
The r squared for the multivariable regression is 0.9866, which means that 98.66% of the variance can be explained by the multivariable regression model. The pvalue for cyl is less than 0.05, so cyl is confunding variables in the relationship between transmission and mpg.
## boxplot for MPG according to different transmission type
g <- ggplot(aes(x = am, y = mpg), data = mtcars)
g <- g + geom_boxplot(aes(fill = am), col = "blue")
g <- g + xlab("Transmission") + ylab("MPG") + labs(title = "MPG on Transmission Type")
g
From the boxplot we can wee that manual transmission has overall higher MPG level compared to automatic transmission type.
## Residual plot for the multivariable regression model
par(mfrow = c(2, 2))
plot(fit3)
## Scatterplot matrix for the data
pairs(mpg ~., mtcars)