Motor trend wants to understand if the type of transmission has a considerable affect on gas mileage. The data is taken from 32 different cars. It was found there are 19 Automatic and 13 manual cars in the study.
rm(list = ls())
mtcars$amlabel <- if_else(mtcars$am == 0, "Automatic", "Manual")
mtcars$am <- as.factor(mtcars$am)
mtcars$amlabel <- as.factor(mtcars$amlabel)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## amlabel
## Mazda RX4 Manual
## Mazda RX4 Wag Manual
## Datsun 710 Manual
## Hornet 4 Drive Automatic
## Hornet Sportabout Automatic
## Valiant Automatic
Dimension <- c("Rows", "Columns")
Number <- dim(mtcars)
data.frame(Dimension, Number)
## Dimension Number
## 1 Rows 32
## 2 Columns 12
Auto <- length(which(mtcars$amlabel == "Automatic"))
Man <- length(which(mtcars$amlabel == "Manual"))
data.frame(Auto, Man)
## Auto Man
## 1 19 13
A boxplot and t test are choosen to determine if these types of transmissions have statistically similar fuel ratings.
mtcars %>%
ggplot(aes(mtcars$amlabel, mtcars$mpg, fill = mtcars$am)) +
geom_boxplot() +
geom_jitter(color = "black") +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none",
axis.title.x = element_blank()) +
ylab("mpg")
Auto_cars <- mtcars[which(mtcars$amlabel == "Automatic"), ]
Man_cars <- mtcars[which(mtcars$amlabel == "Manual"), ]
t.test(Man_cars$mpg, Auto_cars$mpg)
##
## Welch Two Sample t-test
##
## data: Man_cars$mpg and Auto_cars$mpg
## t = 3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.209684 11.280194
## sample estimates:
## mean of x mean of y
## 24.39231 17.14737
Based on the box plots and the t test, the manual is more efficient. The mean mpg of the manual car is 24.4 while the mean mpg of the automatic is 17.1. Along with a p value of 0.0013 we can reject the null hypothysis that automatic and manual cars achieve similar mpg.
The findings from the box plot can be futher assessed by looking at a linear model of mpg and type of transmission. This does not take into account the other factors of the car such as weight, disp, height, etc.
fit_basic <- lm(mpg ~ am, mtcars)
summary(fit_basic)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
From the basic linear model, an R^2 of 0.36 shows there is a very loose coorilation between transmission type and fuel economy. So to determine the level of influence on the fuel economy rating we can look at the amount of significance each variable has on our linear model.
fit_var <- aov(mpg ~ ., mtcars)
summary(fit_var)
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 1 817.7 817.7 116.425 5.03e-10 ***
## disp 1 37.6 37.6 5.353 0.03091 *
## hp 1 9.4 9.4 1.334 0.26103
## drat 1 16.5 16.5 2.345 0.14064
## wt 1 77.5 77.5 11.031 0.00324 **
## qsec 1 3.9 3.9 0.562 0.46166
## vs 1 0.1 0.1 0.018 0.89317
## am 1 14.5 14.5 2.061 0.16586
## gear 1 1.0 1.0 0.138 0.71365
## carb 1 0.4 0.4 0.058 0.81218
## Residuals 21 147.5 7.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the variance analysis we find that the terms with the biggest significance on mpg are cyl, wt, disp, drat, and am. We’ll now construct a linear model with these parameters to show how the fuel economoy is more coorilated to factors other than the transmission.
multifit <- lm(mpg ~ cyl + wt + disp + drat + am, mtcars)
summary(multifit)
##
## Call:
## lm(formula = mpg ~ cyl + wt + disp + drat + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3176 -1.3829 -0.4728 1.3229 6.0596
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.296380 7.538394 5.478 9.56e-06 ***
## cyl -1.793995 0.650540 -2.758 0.01051 *
## wt -3.587041 1.210500 -2.963 0.00643 **
## disp 0.007375 0.012319 0.599 0.55462
## drat -0.093628 1.548780 -0.060 0.95226
## am1 0.172981 1.530043 0.113 0.91085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.692 on 26 degrees of freedom
## Multiple R-squared: 0.8327, Adjusted R-squared: 0.8005
## F-statistic: 25.88 on 5 and 26 DF, p-value: 2.528e-09
par(mfrow= c(2,2))
plot(multifit)
After reconstructing the linear model for parameters that more closely coorilate with the change in fuel economy, we find the transmission has a very small impact compared to wt and number of cylinders. This is further detailed by the fit of our linear model where we have no patterns in our residual fit which indicates good model fit. A Normal Q-Q line shows there is little to no skew to our residuals. Finally the residual vs the leverage shows how there are a few outliers that could affect our data.