Motor Trend is a magazine about the automobile industry and they are interested in exploring the relationship between a set of variables and miles per gallon. They are particularly interested in the following two questions:
“Is an automatic or manual transmission better for MPG”
“Quantify the MPG difference between automatic and manual transmissions”
For this analysis we considered the dataset of mtcars. The initial aim is to find if there is a relationship between MPG and transmissions and also to try to build a regression model in order to quantify the MPG difference between automatic and manual transmissions. Finally in order to come up with the most appropriate model we examine also other variables and we kept also the most statistical significant using the “stepwise” process.
Libraries used
library(ggplot2)
library(MASS)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
A boxplot shows that the Manual Transmission has higher MPG than the Automatic
qplot(factor(mtcars$am, labels = c("Automatic", "Manual")), mtcars$mpg, data=mtcars, geom="boxplot", xlab='Transmission', ylab='Miles per Gallon', main='Miles/Gallon per Transmission')
Also a T-test confirms that there is statistical difference in Transmission regarding the MPG (P-Value<0.05)
t.test(mtcars$mpg~mtcars$am,conf.level=0.95)
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
At this point we are going to run a regression model using al the variales and by applying the stepwise algorith we are going to keep the most appropriate variables
stepwisemodel = step(lm(data = mtcars, mpg ~ .), trace=0, direction=c("both"))
summary(stepwisemodel)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The model suggests that we should keep the variables wt, qsec and am in order to explain the MPG. Explanation of the coefficient variables
Every lb/1000 weight increase will cause a decrease of roughly 3.9 MPG
Every increase of 1/4 mile time will cause an increase of 1.2 MPG, and on average,
On average a manual transmission leads to 2.93 more MPG than the automatic
The model is able to explain 85% of variance and the residual plots shows to be distributed normally
The MPG can be explained by the “weight” the “mile time” and the “transmission”. However maybe there exist other factors who can explain better the MPG.
The plot below shows the data that we have chosen in order to explain the MPG
mtcars$amf <- factor(mtcars$am, labels = c("Automatic", "Manual"))
qplot(wt, mpg, data=mtcars, col=amf, size=qsec)
A pairwise scatter plot can help us to see the correlation between all the variables
pairs(mtcars)
And the residual plots of the final suggested model
par(mfrow=c(2,2))
plot(stepwisemodel)