Furawa
03 APRIL 2019
We will look at a dataset of collection of cars mtcars and explore the relationship between a set of variables and miles per gallon (MPG)(outcome). The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
The scope is to answer at 2 questions :
1. Is an automatic or manual transmission better on MPG 2. Quantify the MPG difference between automatic and manual transmissions
The analysis below tell us that lighter cars with manual transmission and heavier cars with automatic transmission will have a higher value of MPG. Manual cars transmission are about 7 MPG more than automatic cars transmission in performance.
# Loading the needed packages
library(ggplot2)
library(dplyr)
library(broom)
library(plotly)
glimpse(mtcars) #Structure of the data## Observations: 32
## Variables: 11
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19....
## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ...
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1...
## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ...
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9...
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3...
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2...
## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ...
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ...
** All the supporting figures are in the Appendix section **
Examining all of the pairwise scatterplots in our data (pairs plot) we can see that there is a strong correlation between mpg and variables like “cyl”, “disp”, “hp” and “wt”. The boxplot shows that in general manual transmission yields higher values of MPG. Let us verify it with some inference.
## mean in group 0 mean in group 1
## 17.14737 24.39231
Our null hypothesis is : “the MPG of the manual and automatic transmission are from the same population”. As the P-value is 0.0013736 we reject our null hypothesis. We can also see with the estimate that the mean for MPG of manual cars is about 7 more than the mean of automatic cars.
Let’s go deeper with some regression analysis.
Let’s start with a simple linear regression
This linear model shows that each additional rating point of manual transmission is associated with 7.245 increase in the expected MPG, the value of the Adjusted R-squared means that the model can just explain about the 33% of the variance of the MPG variable wich means that we have to add others variables to the model as 33% is low.
We can clearly see with the parrallel slope plot (see Appendix) that there is an interaction between wt (weight) and am (transmission) as automatic cars tend to be heavier than manual car. Let’s use this interaction to make a new multiple regression.
summary(fullModel <- lm(mpg ~ . , mtcars)) # Model with all variables
stepFullModel <- step(fullModel) # step to find the best model With the step function with find the best model which is the one with the lower AIC (Akaike’s information criterion), in our case the best model is mpg ~ wt + qsec + am that has an Adjusted R-squared of 0.8336 ,means that the model can explain about 83% of the variance of MPG.
The summary also tell us that all the coefficients of this model are significant at .05 significant level. At this point we can add the interaction between wt and am to the best model.
Now we have an Adjusted R-squared value of 0.8804 which means that the model can explain about 88% of the variance of MPG which is better than the 83% of the previous model, we can say that is a better model.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723053 5.8990407 1.648243 0.1108925394
## wt -2.936531 0.6660253 -4.409038 0.0001488947
## qsec 1.016974 0.2520152 4.035366 0.0004030165
## am 14.079428 3.4352512 4.098515 0.0003408693
## wt:am -4.141376 1.1968119 -3.460340 0.0018085763
We can see that cars with manual transmission increase 14.08 +(-4.14)*wt MPG on average than cars with automatic transmission, obviously when wt and qsec remain constant.
With The residual plots(see Appendix) we can see that the residuals are normally distributed because the points lie closely to the line (Q-Q plot), there are not outliers.
Boxplot of MPG vs am (transmission)
Parrallel slope of MPG vs Weight by Transmission
Residual Plots