Gas mileage study based on transmission type

Motor trend wants to understand if the type of transmission has a considerable affect on gas mileage. The data is taken from 32 different cars. It was found there are 19 Automatic and 13 manual cars in the study.

rm(list = ls())

mtcars$amlabel <- if_else(mtcars$am == 0, "Automatic", "Manual")
mtcars$am <- as.factor(mtcars$am)
mtcars$amlabel <- as.factor(mtcars$amlabel)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                     amlabel
## Mazda RX4            Manual
## Mazda RX4 Wag        Manual
## Datsun 710           Manual
## Hornet 4 Drive    Automatic
## Hornet Sportabout Automatic
## Valiant           Automatic
Dimension <- c("Rows", "Columns")
Number <- dim(mtcars)
data.frame(Dimension, Number)
##   Dimension Number
## 1      Rows     32
## 2   Columns     12
Auto <- length(which(mtcars$amlabel == "Automatic"))
Man <- length(which(mtcars$amlabel == "Manual"))
data.frame(Auto, Man)
##   Auto Man
## 1   19  13

Is Automatic or Manual more efficient?

A boxplot and t test are choosen to determine if these types of transmissions have statistically similar fuel ratings.

mtcars %>%
  
ggplot(aes(mtcars$amlabel, mtcars$mpg, fill = mtcars$am)) + 
  geom_boxplot() +
  geom_jitter(color = "black") + 
  scale_fill_brewer(palette = "Dark2") + 
  theme(legend.position = "none",
        axis.title.x = element_blank()) +
  ylab("mpg") 

Auto_cars <- mtcars[which(mtcars$amlabel == "Automatic"), ]
Man_cars <- mtcars[which(mtcars$amlabel == "Manual"), ]

t.test(Man_cars$mpg, Auto_cars$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  Man_cars$mpg and Auto_cars$mpg
## t = 3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.209684 11.280194
## sample estimates:
## mean of x mean of y 
##  24.39231  17.14737

Based on the box plots and the t test, the manual is more efficient. The mean mpg of the manual car is 24.4 while the mean mpg of the automatic is 17.1. Along with a p value of 0.0013 we can reject the null hypothysis that automatic and manual cars achieve similar mpg.

Linear Model breakdown

The findings from the box plot can be futher assessed by looking at a linear model of mpg and type of transmission. This does not take into account the other factors of the car such as weight, disp, height, etc.

fit_basic <- lm(mpg ~ am, mtcars)
summary(fit_basic)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

From the basic linear model, an R^2 of 0.36 shows there is a very loose coorilation between transmission type and fuel economy. So to determine the level of influence on the fuel economy rating we can look at the amount of significance each variable has on our linear model.

fit_var <- aov(mpg ~ ., mtcars)
summary(fit_var)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cyl          1  817.7   817.7 116.425 5.03e-10 ***
## disp         1   37.6    37.6   5.353  0.03091 *  
## hp           1    9.4     9.4   1.334  0.26103    
## drat         1   16.5    16.5   2.345  0.14064    
## wt           1   77.5    77.5  11.031  0.00324 ** 
## qsec         1    3.9     3.9   0.562  0.46166    
## vs           1    0.1     0.1   0.018  0.89317    
## am           1   14.5    14.5   2.061  0.16586    
## gear         1    1.0     1.0   0.138  0.71365    
## carb         1    0.4     0.4   0.058  0.81218    
## Residuals   21  147.5     7.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the variance analysis we find that the terms with the biggest significance on mpg are cyl, wt, disp, drat, and am. We’ll now construct a linear model with these parameters to show how the fuel economoy is more coorilated to factors other than the transmission.

multifit <- lm(mpg ~ cyl + wt + disp + drat + am, mtcars)
summary(multifit)
## 
## Call:
## lm(formula = mpg ~ cyl + wt + disp + drat + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3176 -1.3829 -0.4728  1.3229  6.0596 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 41.296380   7.538394   5.478 9.56e-06 ***
## cyl         -1.793995   0.650540  -2.758  0.01051 *  
## wt          -3.587041   1.210500  -2.963  0.00643 ** 
## disp         0.007375   0.012319   0.599  0.55462    
## drat        -0.093628   1.548780  -0.060  0.95226    
## am1          0.172981   1.530043   0.113  0.91085    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.692 on 26 degrees of freedom
## Multiple R-squared:  0.8327, Adjusted R-squared:  0.8005 
## F-statistic: 25.88 on 5 and 26 DF,  p-value: 2.528e-09
par(mfrow= c(2,2))
plot(multifit)

After reconstructing the linear model for parameters that more closely coorilate with the change in fuel economy, we find the transmission has a very small impact compared to wt and number of cylinders. This is further detailed by the fit of our linear model where we have no patterns in our residual fit which indicates good model fit. A Normal Q-Q line shows there is little to no skew to our residuals. Finally the residual vs the leverage shows how there are a few outliers that could affect our data.