knitr::opts_chunk$set(echo = TRUE)

Executive Summary

In this project we will explore some features that affect fuel consumption in miles per gallon (MPG) answering some questions about automatic and manual transmissions (am).

We are looking a dataset of a collection of cars (mtcars - Motor Trend Car Road Tests), and are interested in exploring the relationship between a set of variables and Miles Per Gallon (MPG).

Particullary, we will analyze the mtcars dataset from the 1974 Motor Trend US magazine and we will answer two questions:

Manual transmission will provide more miles per gallon, when compared with Automatic. On average, a car with manual transmission will achieve 24 mpg, versus 17 mpg for car with automatics, and having equality in the other factors,manual transmission is associated with a 2.94 increase in mpg over automatic.

Exploratory Data Analysis

knitr::opts_chunk$set(echo = TRUE)
data(mtcars)
head(mtcars)
dim(mtcars)
## [1] 32 11

We can see 32 observations and 11 variables.

We make changes to the variables to better manage the data.

mtcars$vs <- factor(mtcars$vs)
mtcars$am.label <- factor(mtcars$am, labels=c("Automatic","Manual")) # 0=automatic, 1=manual
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)

We will calculate mean MPG values for cars with Automatic and Manual transmission:

aggregate(mtcars$mpg,by=list(mtcars$am.label),FUN=mean)

We can see that Manual transmission yields on average 7 MPG more than Automatic, Let’s now test this hypothesis with a Simple Linear Regression Test:

T_simple <- lm(mpg ~ am, data=mtcars)
summary(T_simple)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The p-value is less than 0.0003 (smaller than 0,001), so we will not reject the hypothesis.

The model equation is mean mpg= 17.1 + 7.2 * (am) + e

We interpret this as :

For automatic transmission the mean mpg is estimated at 17.1 plus an error. For manual trasmission, the mean mpg is expected to increase by 7.2.

By looking at the model we would conclude that manual transmission is superior for miles per gallon, but the R-squared value for this test is only ~= .36, suggesting that only a third or so of variance in MPG can be attributed to transmission type alone. Let’s perform an Analysis of Variance for the data: To select a better model, We will use the step function, as follows: select_model <-step (lm (mpg ~., data = mtcars), direction = “both”), but due to the brevity of the report, we will not show here all the results, but only the summary that indicates which is the best model.

summary(select_model)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The model suggested is: lm(formula = mpg ~ wt + qsec + am, data = mtcars)

We see that this includes transmision type, which was our variable of interst, so we will keep this model.

Multi_var= lm(mpg~ wt + qsec + am, data=mtcars)
summary(Multi_var)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

We now see that the model explains 85% of the variance. We interpret the am coefficient and conclude that “ceteris paribus” (keeping weight(wt) and 1/4 mile time (qsec) constant), manual transmission is associated with a 2.94 increase in mpg over automatic.

Residual Analisys

library(ggplot2)
par(mfrow = c(2, 2))
plot(Multi_var)

-The Normal Q-Q plot shows that residuals are normally distributed (points close to line).

-Scale-Location plot shows a constant variance due to a constant band pattern.

-The “Residuals vs Fitted” plot here shows us that the residuals are homoscedastic

Appendix

Boxplot: Mean of the variable “mpg” for Automatic and Manual Transmission.

# 0=automatic, 1=manual
boxplot(mpg ~ am.label, data = mtcars, col = (c("red","blue")), ylab = "MPG (Miles Per Gallon)", xlab = "Transmission Type")