Brief

Motor Trend is a magazine about the automobile industry and they are interested in exploring the relationship between a set of variables and miles per gallon. They are particularly interested in the following two questions:

“Is an automatic or manual transmission better for MPG”

“Quantify the MPG difference between automatic and manual transmissions”

Summary

For this analysis we considered the dataset of mtcars. The initial aim is to find if there is a relationship between MPG and transmissions and also to try to build a regression model in order to quantify the MPG difference between automatic and manual transmissions. Finally in order to come up with the most appropriate model we examine also other variables and we kept also the most statistical significant using the “stepwise” process.

Data Analysis

Libraries used

library(ggplot2)
library(MASS)

Exploratory Analysis

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

A boxplot shows that the Manual Transmission has higher MPG than the Automatic

qplot(factor(mtcars$am, labels = c("Automatic", "Manual")), mtcars$mpg, data=mtcars, geom="boxplot", xlab='Transmission', ylab='Miles per Gallon', main='Miles/Gallon per Transmission')

Also a T-test confirms that there is statistical difference in Transmission regarding the MPG (P-Value<0.05)

t.test(mtcars$mpg~mtcars$am,conf.level=0.95)
## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Regression Analysis

Stepwise Regression

At this point we are going to run a regression model using al the variales and by applying the stepwise algorith we are going to keep the most appropriate variables

stepwisemodel = step(lm(data = mtcars, mpg ~ .), trace=0, direction=c("both"))
summary(stepwisemodel)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The model suggests that we should keep the variables wt, qsec and am in order to explain the MPG. Explanation of the coefficient variables

Every lb/1000 weight increase will cause a decrease of roughly 3.9 MPG

Every increase of 1/4 mile time will cause an increase of 1.2 MPG, and on average,

On average a manual transmission leads to 2.93 more MPG than the automatic

The model is able to explain 85% of variance and the residual plots shows to be distributed normally

Conclusion

The MPG can be explained by the “weight” the “mile time” and the “transmission”. However maybe there exist other factors who can explain better the MPG.

The plot below shows the data that we have chosen in order to explain the MPG

mtcars$amf <- factor(mtcars$am, labels = c("Automatic", "Manual"))
qplot(wt, mpg, data=mtcars, col=amf, size=qsec)

Appendix

A pairwise scatter plot can help us to see the correlation between all the variables

pairs(mtcars)

And the residual plots of the final suggested model

par(mfrow=c(2,2))
plot(stepwisemodel)