Instructions

You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

Is an automatic or manual transmission better for MPG ?
Quantify the MPG difference between automatic and manual transmissions?

Data Processing

library(ggplot2)
library(dplyr)
library(MASS)
data(mtcars)
str(mtcars)
df<-mtcars
#Rename am
df$am<-ifelse(df$am==1,"manual","automatic")

32 obs. of 11 variables
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (lb/1000)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Is an automatic or manual transmission better for MPG ?

You can also embed plots, for example:

Mean Comparison

meanAM<- df %>% group_by(am) %>%
         summarise(mean=mean(mpg))
meanAM[meanAM$am=="manual","mean"]-meanAM[meanAM$am=="automatic","mean"]
##       mean
## 1 7.244939

Manual Transmission seems to be quite more economic than automatic as it has a higher MPG mean + 7.25.

Compute unpaired two-samples t-test

The unpaired two-samples t-test is used to compare the mean of two independent groups.

res <- t.test(df$mpg ~ df$am, data = df, var.equal = TRUE)
res
## 
##  Two Sample t-test
## 
## data:  df$mpg by df$am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.84837  -3.64151
## sample estimates:
## mean in group automatic    mean in group manual 
##                17.14737                24.39231

The p-value of the test is 0.000285, which is less than the significance level alpha = 0.05. We can conclude that automatic’s average mpg is significantly different from manual’s average mpg with a p-value = 0.000285.

Model Selection

fit <- lm(mpg~., data = df)
step <- stepAIC(fit, direction="both")
step$anova # display results
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
## Final Model:
## mpg ~ wt + qsec + am
## 
## 
##     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                             21   147.4944 70.89774
## 2  - cyl  1 0.07987121        22   147.5743 68.91507
## 3   - vs  1 0.26852280        23   147.8428 66.97324
## 4 - carb  1 0.68546077        24   148.5283 65.12126
## 5 - gear  1 1.56497053        25   150.0933 63.45667
## 6 - drat  1 3.34455117        26   153.4378 62.16190
## 7 - disp  1 6.62865369        27   160.0665 61.51530
## 8   - hp  1 9.21946935        28   169.2859 61.30730

Fit the model

fit <- lm(mpg~wt + qsec + am, data = df)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## ammanual      2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Conclusion

Our model explains 84.97% of the variance and all our variables are statistically significant. The coefficient of transmission manual tells us that a manual transmisison car is more economic as it can perform around 3 miles per gallon more than the automatic transmission. However the difference in mpg is less than our first analysis where we observed a mean difference of 7 miles. This is to do with the interaction effect with the other variables included in our moudel.

Appendix

Residuals plot

{par(mfrow=c(2,2))
plot(fit)}

Density plot

ggplot(df, aes(mpg, fill = am, colour = am)) +
  geom_density(alpha = 0.1) +
     theme_bw() +
    theme(plot.title = element_text(hjust = 0.5))+
    ggtitle("Density mpg by transmission type")+
    labs(x = "mpg", y ="density")