Executive Summary

For this project, we are looking at a data set of a collection of cars (mtcars), and we want to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). The two key questions that we want to answer are:

Exploratory Data Analysis

Before doing any deep analysis, let’s take a quick look into the data.

What we are trying to analyze is how Transmission predicts MPG.

The variables we want to keep a close eye on are “am” (for Transmission– 0 = automatic, 1 = manual) and “mpg”.

We’ll do a linear regression for only Transmission:

fit<- lm(mpg~am, data=mydata)
coef(fit)
## (Intercept)          am 
##   17.147368    7.244939

The results show that when you have an automatic transmission you have an average of 17.15 mpg and when you go for a manual transmission, you have in average 7.24 mpg more. This initial analysis ignores the rest of the variables.

Before moving forward, let’s understand what is the correlation of am & mpg compared to the rest of the variables. Appendix 1 shows data in %’s and indicates that Transmission and MPG have a correlation of only 60% vs variables like Weight (-87%), Cyl (-85%) and Displacement (-85%).

We’ll turn the applicable variables into factors and run another regression taking all the variables into consideration for the analysis,that way we undertand better the impact of each in MPG.

Model Selection

Based on the quick exploratory analysis done we have discovered other variables that have a higher correlation with MPG and now we need to define what would be the best model that would fit our analysis.

bestMod<- step(allvar,direction="both", trace=0)
summary(bestMod)
## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## am1          1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

Based on these results, we can identify that the best model is mpg ~ cyl + hp + wt + am. Adjusted R^2 is .84, which means that 84% of the variability is explained with this model. Now, let’s compare both models, using only “am” and using “cyl”, “hp”, “wt” and “am”.

anova(bestMod, fit)
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + hp + wt + am
## Model 2: mpg ~ am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     26 151.03                                  
## 2     30 720.90 -4   -569.87 24.527 1.688e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value demostrates the significance of the model. We reject the null hypotheis of the variables “cyl”, “hp” and “wt” not contributing to the model.

Residual plot and diagnostics

Please refer to Appendix 8 for viewing the residual plots. From these plots we can see that:

Statistical Inference

If we refer to Appendix 4 we can see that mpg has a normal distribution. The t.test shows the significance of the difference between manual vs automatic transmission.

t<-t.test(mpg~am,data=mydata)
t$p.value
## [1] 0.001373638
hist(mydata$mpg, freq = FALSE, breaks = 15)

Conclusions:

Going back to the original questions that we wanted to get resolved we conclude the following:

Appendix

Appendix 1. Correlation Matrix

Appendix 2. Multivariable analysis

##    Estimate Std. Error  t value     Pr(>|t|)
## am 24.39231   3.956183 6.165616 7.666189e-07

Appendix 8. Residual Plots

##       Mazda RX4 Wag   Chrysler Imperial       Toyota Corona 
##           0.2496110           0.2611168           0.2777872 
## Lincoln Continental       Maserati Bora 
##           0.2936819           0.4713671