library(dplyr)
library(ggplot2)
library(corrplot)
library(formattable)
In this assignment I compared manual transmission and automatic transmission for MPG (Miles/US galon). There was used simple exploratory analysis as well as hypothesis testing and linear regression. As you can see in this paper having a manual car cause increase in MPG in comparison automatic transmission.
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles
mtcars %>%
formattable(align="l") %>%
as.datatable()
Table and box plot show car transmission types by MPG. As you can see there is increase in MGP for manual transmission in comparison to automatic transmission
mtcars %>%
mutate(Transmission = ifelse(am == 0, no = "manual", yes = "automatic")) %>%
group_by(Transmission) %>%
summarise(median = median(mpg), mean = mean(mpg), sd = sd(mpg), min = min(mpg), max = max(mpg)) %>%
formattable(align="l")
| Transmission | median | mean | sd | min | max |
|---|---|---|---|---|---|
| automatic | 17.3 | 17.14737 | 3.833966 | 10.4 | 24.4 |
| manual | 22.8 | 24.39231 | 6.166504 | 15.0 | 33.9 |
mtcars %>%
mutate(am = ifelse(am == 0, no = "manual", yes = "automatic")) %>%
ggplot(aes(y = mpg, x = am, fill = am)) +
geom_boxplot(alpha = .7, varwidth = TRUE)
We can also see that there is linear correlation between mpg and the variables disp, hp, wt (strong negative) and drat (strong positive)
corrplot(corr = cor(mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec")],
method = "pearson"),
method = "number", type = "lower")
I’m interested in if the average value of MPG differs significantly from a manual and automatic transmission within a defined confidence level 0.05
t.test(formula = mpg ~ am, data = mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
The p-value is 0.001374 so we reject the null hypothesis and we inference that there is a significant statistical difference in the mean MPG between manual transmission cars and that of automatic transmission cars
I’m going to use function step() to automatically choose the best model by AIC criterion in a stepwise algorithm
fit <- step(object = lm(formula = mpg ~ ., data = mtcars), direction = "both")
Based on diagnostic plots we can say that residual are normally distributed and homoskedastic.
par(mfrow=c(2,2))
plot(fit)
summary(fit)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am1 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Looking at result we can say that: The adjusted R-squared for the model is 0.8336, and the three variables coefficients are all significant at the 5% confidence level. Based on coefficients we can say that having a manual car we gain 2.9358 MPG above that of an automatic