Executive summary

Purpose of this analysis document is to showcase the performance differnec between automatic and manual transmission cars derived from the mtcars dataset. During this analysis process, we have noticed a significant MPG difference between automatic and manual transmission car as explained in the later section of this document. We are going to answer following questions through this analysis document:

“Is an automatic or manual transmission better for MPG”

“Quantify the MPG difference between automatic and manual transmissions”

Exploratory data analysis

“mtcars” is going to be the primary dataset for this analysis process. This document shows the exploratory relationship between the mpg and the transmission types.

data("mtcars")
head(mtcars)
#transform to factor variable
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels = c("Automatic","Manual"))

Including Plots

Explorarory data analysis derived the comparison between two emission types using 3 different plotting mechanism as mentioned below

## Plot 1 - boxplot to show the difference
plot(mtcars$mpg ~ mtcars$am,col = (c("red","blue")), ylab = "Miles Per Gallon", xlab = "Transmission Type")

##Plot 2 - Pairs plot for the data set
pairs(mpg ~ ., data = mtcars)

## Plot 3 - bar plot to show the discrete difference between two emission types
ggplot(data=mtcars, aes(x=mtcars$am, y=mtcars$mpg, fill=mtcars$am)) +
    geom_bar(stat="identity",width = .5) +
  xlab("Transmission Type") + ylab("Miles Per Gallon")

Regression analysis

As a part of exploratory data analysis, visually we showed that the automated trasnsmission systems are better then manual system in terms of miled per gallon consumption level.

aggregate(mpg~am,data=mtcars,mean)

The above calculation shows that the automatic transmission cars use 7.24 less MPG than the manual cars. Let’s perform hypothesis testing using t-test :

automatic_car <- filter(mtcars,mtcars$am == "Automatic")
manual_car <- filter(mtcars,mtcars$am == "Manual")
t.test(automatic_car$mpg,manual_car$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  automatic_car$mpg and manual_car$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

p-value is 0.001374, that means we have a signigficant variance betwen MPGs of two transmission types. Now we need to quantify this differnec using linear model:

sv_lm <- lm(mpg~am,data=mtcars)
summary(sv_lm)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

R squared = 0.3598 means this linear model only explains 36% of the variance between the variables. In order to make the model more robust we need to add more independent variables to it. Let’s take a look at the multivariate linear model related to this analysis. In the new model, we will consider other variables like cyl, wt, disp, hp which have a strong corelation with mpg.

mv_lm <- lm(mpg~am + wt + cyl + disp + hp,data=mtcars)
summary(mv_lm)
## 
## Call:
## lm(formula = mpg ~ am + wt + cyl + disp + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9374 -1.3347 -0.3903  1.1910  5.0757 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.864276   2.695416  12.564 2.67e-12 ***
## amManual     1.806099   1.421079   1.271   0.2155    
## wt          -2.738695   1.175978  -2.329   0.0282 *  
## cyl6        -3.136067   1.469090  -2.135   0.0428 *  
## cyl8        -2.717781   2.898149  -0.938   0.3573    
## disp         0.004088   0.012767   0.320   0.7515    
## hp          -0.032480   0.013983  -2.323   0.0286 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.453 on 25 degrees of freedom
## Multiple R-squared:  0.8664, Adjusted R-squared:  0.8344 
## F-statistic: 27.03 on 6 and 25 DF,  p-value: 8.861e-10

R-squared value in .84 that means this model is able to show more variance than the single variant linear model . So it explains that the other variables like cyl,var,gear,carb do have a strong impact on the variable mpg. Also p-value is significantly low 1.23e-05 which proves that this model is better fit to find the relations between transmission types with the other co-related variables.

Compare two linear models using anova function:

anova(sv_lm,mv_lm)

Conclusion

Hence proved that, the MPG difference between automatic and manual transmission is around 1.8 MPG.