Overview

In this report we will the influence of transmission on MPG (miles per gallon) cars can do. For that we will use mtcars data We will see that to define a good model of consumption in cars we will need to take into consideration the weight and the horsepower of the car. And we will see that manual cars almost can travel 2.1MPG more than automatic cars.

Exploratory analysis

Let’s see how it is distributed the MPG for automatic/manual transmission

with(mtcars, plot(mpg ~ am, col=am, xlab="Transmission", ylab="MPG", main="Consumption depending on transmission"))

However, if we check the relation of MPG vs Transmission as a linear regression:

fit<-lm(mpg ~ factor(am), mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        17.147      1.125  15.247 1.13e-15 ***
## factor(am)Manual    7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

We can see only 36% of the variation is explained by the model. We need to re-define the model making it multivariable dependant. Let’s check on which variables.

Choosing the model

# I need to do this to use cor to evaluate the correlation of the database
data(mtcars)

We can check which are the parameters more correlated with MPG:

sort(abs(cor(mtcars)[1,]), decreasing=TRUE)
##       mpg        wt       cyl      disp        hp      drat        vs 
## 1.0000000 0.8676594 0.8521620 0.8475514 0.7761684 0.6811719 0.6640389 
##        am      carb      gear      qsec 
## 0.5998324 0.5509251 0.4802848 0.4186840

This means that MPG is heavily correlated with number of weight, number of cylinders, displacement and horsepower. However, if we look a little closer, we could also see that cylinders, displacement and horsepower are highly correlated, and displacement is also correlated with the weight.

sort(abs(cor(mtcars)[2,]), decreasing=TRUE)
##       cyl      disp       mpg        hp        vs        wt      drat 
## 1.0000000 0.9020329 0.8521620 0.8324475 0.8108118 0.7824958 0.6999381 
##      qsec      carb        am      gear 
## 0.5912421 0.5269883 0.5226070 0.4926866
sort(abs(cor(mtcars)[3,]), decreasing=TRUE)
##      disp       cyl        wt       mpg        hp        vs      drat 
## 1.0000000 0.9020329 0.8879799 0.8475514 0.7909486 0.7104159 0.7102139 
##        am      gear      qsec      carb 
## 0.5912270 0.5555692 0.4336979 0.3949769

Now we need to choose which of the three parameters we are going to use in our model. First we need to ensure that the different parameters are relevant to the model:

# models
fit2<-lm(mpg ~ factor(am) + wt, mtcars)
fit3<-lm(mpg ~ factor(am) + wt + factor(cyl), mtcars)
fit4<-lm(mpg ~ factor(am) + wt + disp, mtcars)
fit5<-lm(mpg ~ factor(am) + wt + hp, mtcars)

anova(fit, fit2, fit3)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + wt
## Model 3: mpg ~ factor(am) + wt + factor(cyl)
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     29 278.32  1    442.58 65.3095 1.107e-08 ***
## 3     27 182.97  2     95.35  7.0353  0.003473 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit, fit2, fit4)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + wt
## Model 3: mpg ~ factor(am) + wt + disp
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     29 278.32  1    442.58 50.2610 1.032e-07 ***
## 3     28 246.56  1     31.76  3.6072   0.06788 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit, fit2, fit5)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + wt
## Model 3: mpg ~ factor(am) + wt + hp
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     29 278.32  1    442.58 68.734 5.071e-09 ***
## 3     28 180.29  1     98.03 15.224 0.0005464 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Cylinders and horsepower are relevant, their p-value is lower than 5%, so the null hypothesis can be rejected, and it can be included the new parameter in the model. It is needed to choose which one is going to be used. In order to do that, the one which explains the model variation with higher value will be selected.

summary(fit3)$r.squared
## [1] 0.8375127
summary(fit5)$r.squared
## [1] 0.8398903

The model with weight and horsepower is the one which higher R-squared value, thus explainning most of the variation seen. This model will be used to model the relationship between MPG and transmission, and quantify the MPG difference between automatic and manual transmission.

summary(fit5)
## 
## Call:
## lm(formula = mpg ~ factor(am) + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## factor(am)1  2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

From that summary, we can see that Manual cars can do 2.1MPG more than Automatic cars

From the figures

par(mfrow = c(2,2))
plot(fit5)

It can be seen that the residuals are normally distributed.

Conclusion

Manual transmission cars almost can travel 2.1MPG more than automatic cars.