Assigniment Description:

Motor Trend, a magazine about the automobile industry is interested in the relationship between a set of variables and miles per gallon (MPG) (outcome). They want answers to the following two questions which will be addressed in this report:

Is an automatic or manual transmission better for MPG? Quantify the MPG difference between automatic and manual transmissions?

Executive Summary:

By using simple and multivariate linear regression, this report shows that there is a difference between the mean mpg for automatic and manual transmissions. The manual cars are 7.245 mpg on average more economical than automatic ones.

Loading the packages:

library(knitr)
library(ggplot2)

Reading the data and checking its data:

mt <- mtcars
head(mt,10)
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
cs <- sapply(mt,class)
cs
##       mpg       cyl      disp        hp      drat        wt      qsec 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##        vs        am      gear      carb 
## "numeric" "numeric" "numeric" "numeric"

Converting transmission type to factor:

mt$am <- as.factor(mt$am)

levels(mt$am) <- c("Automatic","Manual")

head(mt,6)
##                    mpg cyl disp  hp drat    wt  qsec vs        am gear
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0    Manual    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0    Manual    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1    Manual    4
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1 Automatic    3
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0 Automatic    3
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1 Automatic    3
##                   carb
## Mazda RX4            4
## Mazda RX4 Wag        4
## Datsun 710           1
## Hornet 4 Drive       1
## Hornet Sportabout    2
## Valiant              1

Regression Analisys

Now, the efect of the car’s transmission type on mpg is plotted in a box plot (figure 1 of Apendix):

aggregate(mpg~am, data = mt, mean)
##          am      mpg
## 1 Automatic 17.14737
## 2    Manual 24.39231

Using a correlation matrix, I will determine the predictors. You may check the full correlation plotted in figure 2 in Appendix.

mt1 <- sort(cor(mtcars)[1,])
mt1
##         wt        cyl       disp         hp       carb       qsec 
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251  0.4186840 
##       gear         am         vs       drat        mpg 
##  0.4802848  0.5998324  0.6640389  0.6811719  1.0000000

By these values and the plot of the figure 2 its clear that wt have the highest correlation (despite its signal) with MPG After this first analisys, I’ll begin my linear regression.

Simple Linear Regression

fit <- lm(mpg~am, data = mt)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

This is the hypothesis test for the model. From the the coefficient and intercepts, we see that automatic cars get 17.147 mpg while those with a manual transmission get 7.245 more miles per gallon. We also see that the R-Squared value is almost 0.36 which means the model only explains 36% of the variance.

Multivariate Linear Regression

Now, we will fit a multivariate linear regression for mpg on am with the additional predictors of wt and hp. We will analyze the variance with ANOVA to determine the differences.

mult <- lm(mpg~am + wt + hp, data = mt); 

anova(fit, mult)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     28 180.29  2    540.61 41.979 3.745e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will use 0.05 as the type I error rate significance benchmark. The second model has a very small p-value of 3.745e-09, so we can reject the null hypothesis and note the difference between the intial and the multivariate model.

Residual Plot

As part of this work, I have to check if the residuals are nice for this analysis.The results were ploted in figure 3.

summary(mult)
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## amManual     2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

The second model explains 84% (rounded) of the variance as indicated by R-squared. We see that wt and hp do confound the relationship between am and mpg. The coefficient for am shows that the manual transmission cars have 2.084 (rounded) more mpg than the automatic.

Appendix