Over view

‘Motor Trend’, a magazine about the auto mobile industry is interested in exploring the relationship between a set of variables and miles per gallon (mpg).

They are particularly interested in the following:

> Is an automatic or manual transmission better for mpg

> Quantify the MPG difference between automatic and manual transmissions

Executive Summary

After performing this analysis, we will conclude that:

When measuring MPG, manual transmissions perform better than automatic transmissions by 7.25MPG, however this single factor only accounts for 36% of the explanation.

When measuring MPG, manual transmissions provide an additional 1.48MPG of performance over automatic transmissions when taking into account multivariables like cyl, hp, wt, these additional factors account for 85% of the explanation.

Required libraries

library(datasets)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.2

Data analysis

data(mtcars)
dim(mtcars)
## [1] 32 11
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90

Exploratory data analysis

In variable ‘am’, 0-represents Automatic transmission and 1-represents manual transmission.

Transmission<-factor(mtcars$am, labels=c("Automatic", "Manual"))

## Plot for the transmission types

boxplot(mpg~Transmission, mtcars, col=c("blue", "green"))

Exploratory data analysis conclusion:

The boxplot is showing that Manual transmission provides better mpg than automatic transmission.

Regression Models

Linear Regression model:

Here test the hypothesis with a simple linear regression test

Linear<- lm(mpg~am, mtcars)
summary(Linear)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Conclusion of linear regression model

The P-value is 0.000285, so we will not reject the hypothesis.

Linear regression model test gives the R-squared value : 0.3598. Since the value of 35.98% is very less, using the single am variable is not sufficient to measure the mpg performance.

Multivariable Regression model:

## checking with the multi variables cylinder(cyl), horsepower(hp) and weight(wt)

Multiple<-lm(mpg~am+cyl+hp+wt, mtcars)
summary(Multiple)
## 
## Call:
## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4765 -1.8471 -0.5544  1.2758  5.6608 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.14654    3.10478  11.642 4.94e-12 ***
## am           1.47805    1.44115   1.026   0.3142    
## cyl         -0.74516    0.58279  -1.279   0.2119    
## hp          -0.02495    0.01365  -1.828   0.0786 .  
## wt          -2.60648    0.91984  -2.834   0.0086 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared:  0.849,  Adjusted R-squared:  0.8267 
## F-statistic: 37.96 on 4 and 27 DF,  p-value: 1.025e-10

Conclusion for Multivariable regression

R-squared values shows that 85% of the mpg performance with multivariables. P-values for cyl, hp, wt are <0.5 shows that these are confounding variables between mpg and am

Analysis of variance model

checking whether there are any statistically significant differences between the means of the independent variables and p-values

model<-aov(mpg~., mtcars)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cyl          1  817.7   817.7 116.425 5.03e-10 ***
## disp         1   37.6    37.6   5.353  0.03091 *  
## hp           1    9.4     9.4   1.334  0.26103    
## drat         1   16.5    16.5   2.345  0.14064    
## wt           1   77.5    77.5  11.031  0.00324 ** 
## qsec         1    3.9     3.9   0.562  0.46166    
## vs           1    0.1     0.1   0.018  0.89317    
## am           1   14.5    14.5   2.061  0.16586    
## gear         1    1.0     1.0   0.138  0.71365    
## carb         1    0.4     0.4   0.058  0.81218    
## Residuals   21  147.5     7.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## The p-values less than 0.5, can consider in addition to transmission type

Appendix

Correlation of the variables

##  The matrix of scatter plots between mpg, am, wt and hp visualizes the relation ship between each pair of variables

requiredVar<-mtcars[, c(1,9,6,4)]
par(mar=c(1,1,1,1))
pairs(requiredVar, panel=panel.smooth, col=9+mtcars$wt)

Residual plots and Diagnostics

## Scatter plots of the multiple variable regression model residuals

par(mfrow=c(2,2))
plot(Multiple, col="green")

The ’Residual Vs Fitted plot shows that the residuals are homoscedastic.

Also showing that they are normally distributed except few outliers