Executive summary

In this report, we will explore the relationship between the variable MPG (miles per gallon) wit the automatic or manual transmission, and we will try to quantify the MPG difference between these two.

Exploratory analysis

library (datasets)
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

It contains 32 observations of 11 variables.

mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
mtcars$am.label <- factor(mtcars$am, labels=c("Automatic","Manual")) # 0=automatic, 1=manual

summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90
boxplot(mpg ~ am.label, data = mtcars, col = (c("green","blue")), ylab = "Miles Per Gallon", xlab= "Transmission Type")

As it can be seen in the boxplot that Manual transmission provides better MPG than Automatic. We will analyze this further in the remaining sections.

The mean MPG values for cars with Automatic and Manual transmission are:

aggregate(mpg~am, data = mtcars, mean)
##   am      mpg
## 1  0 17.14737
## 2  1 24.39231

Hypothesis Analysis

We hypothesize that automatic cars have an MPG lower than manual cars. We determine if this is a significant difference with a t-test.

t.test(mtcars[mtcars$am.label == "Automatic",]$mpg, mtcars[mtcars$am.label == "Manual",]$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  mtcars[mtcars$am.label == "Automatic", ]$mpg and mtcars[mtcars$am.label == "Manual", ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

As the p-value is 0.001374, we can state this is a significant difference.

Simple linear regression

MPG si the dependent variable and am is the independent variable to fit a linear regression.

lm.1 <- lm(mpg ~ am, data = mtcars)
summary(lm.1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Although the p-value is less than 0.0003 (not reject the hypothesis), the adjusted R squared value is only 0.338 which means that only around 34% of the regression variance can be explained by our model. For that reason, a multivariate linear regression should be implemented.

Multivariable Regression Model

The plot exploring the other variables to see how all they correlate with mpg is presented in the Appendix. Cyl, disp, hp, wt have the strongest correlation with mpg, so we are using them in the model.

multi <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
summary(multi)
## 
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5952 -1.5864 -0.7157  1.2821  5.5725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.20280    3.66910  10.412 9.08e-11 ***
## am1          1.55649    1.44054   1.080  0.28984    
## cyl         -1.10638    0.67636  -1.636  0.11393    
## disp         0.01226    0.01171   1.047  0.30472    
## hp          -0.02796    0.01392  -2.008  0.05510 .  
## wt          -3.30262    1.13364  -2.913  0.00726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.8273 
## F-statistic:  30.7 on 5 and 26 DF,  p-value: 4.029e-10

The model explains 83% of the variance so the variables cyl, disp, hp, wt did affect the correlation between mpg and am. We can say the difference between automatic and manual transmissions is 1.55 MPG.

The plot 2 in the Apendix shows the residuals.

APENDIX

Plot 1

pairs(mpg ~ ., data = mtcars)

### Plot 2

par(mfrow = c(2, 2))
plot(multi)