Synopsis

In this Project on mtcars data set, we will work to explore a magazine about the automobile industry. through looking at a data set of a collection of cars, we will examine the relationship between a set of variables and miles per gallon (MPG) (outcome).Throughout the project we will accomplish two measure opportunity:

Is an automatic or manual transmission better for MPG

Quantify the MPG difference between automatic and manual transmissions

Load Necessary Libraries

library(ggplot2)
library(dplyr)
data(mtcars)
head(mtcars)
 mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
str(mtcars)
'data.frame':   32 obs. of  11 variables:  
$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...  

$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...  

$ disp: num  160 160 108 258 360 ...  

$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...  

$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...  

$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...  

$ qsec: num  16.5 17 18.6 19.4 17 ...  

$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...  

$ am  : num  1 1 1 0 0 0 0 0 0 0 ...  

$ gear: num  4 4 4 3 3 3 3 4 4 4 ...  

$ carb: num  4 4 1 1 2 1 4 2 2 4 ...

for further exploration, now will convert vairables into factors

mtcars$cyl<- factor(mtcars$cyl); mtcars$vs<- factor(mtcars$vs); mtcars$gear<- factor(mtcars$gear); mtcars$carb<- factor(mtcars$carb); mtcars$am<- factor(mtcars$am,labels=c("Automatic","Manual"))
aggregate(mpg ~ am, mtcars, mean)
 am      mpg
1 Automatic 17.14737
2    Manual 24.39231
manualC<- mtcars[mtcars$am=="Manual",]; automaticC<- mtcars[mtcars$am=="Automatic",]
t.test(manualC$mpg, automaticC$mpg)
Welch Two Sample t-test

data:  manualC$mpg and automaticC$mpg
t = 3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  3.209684 11.280194
sample estimates:
mean of x mean of y 
 24.39231  17.14737 
Cdata<- lm(mpg ~ am, mtcars)
summary(Cdata)
Call:
lm(formula = mpg ~ am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.3923 -3.0923 -0.2974  3.2439  9.5077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   17.147      1.125  15.247 1.13e-15 ***
amManual       7.245      1.764   4.106 0.000285 ***
---
Signif. codes:  0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1

Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared:  0.3598,    Adjusted R-squared:  0.3385 
F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
multifit_model<- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
anova(Cdata, multifit_model)
Analysis of Variance Table

Model 1: mpg ~ am
Model 2: mpg ~ am + cyl + disp + hp + wt
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     30 720.90                                  
2     25 150.41  5    570.49 18.965 8.637e-08 ***
---
Signif. codes:  0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1
summary(multifit_model)
Call:
lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9374 -1.3347 -0.3903  1.1910  5.0757 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.864276   2.695416  12.564 2.67e-12 ***
amManual     1.806099   1.421079   1.271   0.2155    
cyl6        -3.136067   1.469090  -2.135   0.0428 *  
cyl8        -2.717781   2.898149  -0.938   0.3573    
disp         0.004088   0.012767   0.320   0.7515    
hp          -0.032480   0.013983  -2.323   0.0286 *  
wt          -2.738695   1.175978  -2.329   0.0282 *  
---
Signif. codes:  0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1

Residual standard error: 2.453 on 25 degrees of freedom
Multiple R-squared:  0.8664,    Adjusted R-squared:  0.8344 
F-statistic: 27.03 on 6 and 25 DF,  p-value: 8.861e-10
ggplot(data = mtcars, aes(mpg)) + geom_histogram() + facet_grid(.~am) + labs(x = "Miles per Gallon", y = "Frequency", title = "MPG Histogram for AT and MT cars")


### ploting data using boxpot to explaing MPG by Transmission type

boxplot(mpg ~ am, data = mtcars, col = (c("yellow","red")), ylab = "Miles$Gallon", xlab = "Transmission Type")


### to understand the correlation we will use pairs plot

mtcars_vars <- mtcars[, c(1, 3, 5, 6, 7, 9, 10)]
mar.orig <- par()$mar  
par(mar = c(1, 1, 1, 1)) 
pairs(mtcars_vars, panel = panel.smooth, col = 9 + mtcars$wt)


### Visulazing the residuals

par(mfrow = c(2,2))
plot(multifit_model)


## Conclusion
### Under this model the Multiple R-squared:0.8664 which is 86% of the variance and as a result, cyl, disp, hp, wt did affect the correlation between mpg and am which is significant. Hence, we can conclude by explaining our second question that the difference between automatic and manual transmissions is 1.81 MPG.