Overview

In this report We need to explore the relationship between different features and miles per gallon (MPG) (outcome) using mtcars dataset. We will also try to answer the following questions: 1- Is an automatic or manual transmission better for MPG? 2- Quantify the MPG difference between automatic and manual transmissions.

Analysis

Exploratory Data Analysis

1- Variables with highest correlation with mpg are wt, cyl, disp and hp.

cor(mtcars$mpg,mtcars)
##      mpg       cyl       disp         hp      drat         wt     qsec
## [1,]   1 -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
##             vs        am      gear       carb
## [1,] 0.6640389 0.5998324 0.4802848 -0.5509251

Check appendix for ggpairs graph

2- Next we make a violin plot of MPG for automatic and manual transmission

amMpgViolin =  ggplot(data = mtcars, aes(x=am, y = mpg, fill = factor(am))) + geom_violin(colour = "black", size = 2)

In the above graph It’s clear that manual transmission has higher MPG than automatic. Therefore automatic is better for mpg.

3- Next we try to choose a model with highest correlated features using anova test

fit1 <- lm(mpg ~ factor(cyl) + wt , data=mtcars)
fit2 <- lm(mpg ~ factor(cyl) + wt + hp, data=mtcars)
fit3 <- lm(mpg ~ factor(cyl) + wt + hp + am, data=mtcars)
fit4 <- lm(mpg ~ factor(cyl)*disp + wt + hp + am, data=mtcars)

Results are Rsquared values of each model followed by anova test results.

## [1] 0.8200146
## [1] 0.8360668
## [1] 0.8400875
## [1] 0.8625561
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) + wt + hp
## Model 3: mpg ~ factor(cyl) + wt + hp + am
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     28 183.06                              
## 2     27 160.78  1    22.281 3.8358 0.06098 .
## 3     26 151.03  1     9.752 1.6789 0.20646  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

from the results of anova test above adding am as a regressor has a relatively high p value, but r squared value is higher so it explains some of the variabity. fit4 is the best model. Residuals of this model don’t follow specific pattern. Check appendix for complete graph.

Quantify the MPG difference between automatic and manual transmissions

1- doing a t-test of the two groups of mpg for automatic and manual transmission

t.test(mpg ~ am, data=mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

from the result of the test we can see that the difference in mpg mean is not equal for automatic and manual test with a 95 confidence interval tne value of mpg for automatic is less from 11.280194 to 3.209684.

Summary

We explored the relationship between different features and miles per gallon (MPG) as the outcome. We found that the best features to include in our model as predictors are cyl, disp, wt, hp and am. We concluded that cars with manual transmission has on average higher mpg than automatic ones

Appendix

1- Explore different relations between variables and the correlation values between them

print(mtcarsPairs)

2- Detailed model residuals and levarage

plot(fit4)