You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
Is an automatic or manual transmission better for MPG ?
Quantify the MPG difference between automatic and manual transmissions?
library(ggplot2)
library(dplyr)
library(MASS)
data(mtcars)
str(mtcars)
df<-mtcars
#Rename am
df$am<-ifelse(df$am==1,"manual","automatic")
32 obs. of 11 variables
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (lb/1000)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
You can also embed plots, for example:
meanAM<- df %>% group_by(am) %>%
summarise(mean=mean(mpg))
meanAM[meanAM$am=="manual","mean"]-meanAM[meanAM$am=="automatic","mean"]
## mean
## 1 7.244939
Manual Transmission seems to be quite more economic than automatic as it has a higher MPG mean + 7.25.
The unpaired two-samples t-test is used to compare the mean of two independent groups.
res <- t.test(df$mpg ~ df$am, data = df, var.equal = TRUE)
res
##
## Two Sample t-test
##
## data: df$mpg by df$am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -10.84837 -3.64151
## sample estimates:
## mean in group automatic mean in group manual
## 17.14737 24.39231
The p-value of the test is 0.000285, which is less than the significance level alpha = 0.05. We can conclude that automatic’s average mpg is significantly different from manual’s average mpg with a p-value = 0.000285.
fit <- lm(mpg~., data = df)
step <- stepAIC(fit, direction="both")
step$anova # display results
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Final Model:
## mpg ~ wt + qsec + am
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 21 147.4944 70.89774
## 2 - cyl 1 0.07987121 22 147.5743 68.91507
## 3 - vs 1 0.26852280 23 147.8428 66.97324
## 4 - carb 1 0.68546077 24 148.5283 65.12126
## 5 - gear 1 1.56497053 25 150.0933 63.45667
## 6 - drat 1 3.34455117 26 153.4378 62.16190
## 7 - disp 1 6.62865369 27 160.0665 61.51530
## 8 - hp 1 9.21946935 28 169.2859 61.30730
fit <- lm(mpg~wt + qsec + am, data = df)
summary(fit)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## ammanual 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Our model explains 84.97% of the variance and all our variables are statistically significant. The coefficient of transmission manual tells us that a manual transmisison car is more economic as it can perform around 3 miles per gallon more than the automatic transmission. However the difference in mpg is less than our first analysis where we observed a mean difference of 7 miles. This is to do with the interaction effect with the other variables included in our moudel.
{par(mfrow=c(2,2))
plot(fit)}
ggplot(df, aes(mpg, fill = am, colour = am)) +
geom_density(alpha = 0.1) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))+
ggtitle("Density mpg by transmission type")+
labs(x = "mpg", y ="density")