Motor Trend Analysis

Summary

In this report we analyze the mtcars dataset for answering some key questions for “Motor Trend”. We are particularly interested in answering the following questions:

1. “Is an automatic or manual transmission better for MPG”

2. “Quantify the MPG difference between automatic and manual transmissions”

Exploration

Let us first do some initial analysis of the dataset and try to identifiy some key features/trends(if any).

## Warning: package 'ggplot2' was built under R version 3.2.3
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Here we are particularly interested in variables - mpg and am. Where mpg is “miles per gallon” and am “transmission” with value 1 for manual and 0 for automatic transmission.

Next, we subset the data we are interested in(mpg and am) and convert am to a factor variable

The plot above suggest that manual transmission cars tend to have a better fuel consumption than their automatic transmission counterparts.This answers our first question although we will be delve deeper.

Let us appy the t test(with 95% confidence interval or 5% type I error rate) to confirm our results. Here the null hypothesis is that there is no difference in MPG for manual and automatic transmission.

## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Since the p-value is significant (<.05) we would reject the null hypothesis

Regression Models

Let us now try to fit a regression model and see the relationship between the predictor(transmission) and outcome(mpg). Out of the three regression models - Linear, Poisson and Binomial we can straight away eliminate Binomial as the outcome (mpg) is not binary although we can do binomial analysis by creating other variables with binary values based on some mpg threshold.For instance, getting a mpg greater than some value is a success(1) for auto transmission. Poisson model is used for modeling count data, rates or proportion.Though mpg is a rate but not specifially in the time domain(gallon is not a time parameter). So the best fit is linear model in this case and we can cross check this using the variance of the distribution exhibited by the data. The variance of the outcome(Yi) is constant (sigma ^2) in case of liner model and dependent on the mean(mui) in case of poisson model. The best fit can also be decided by looking at the residuals and if they exhibit any pattern. The more random and pattern-less the residual distribution the better fit the model is.

Linear regression equation: mpg = beta0 + beta1 * am + error

where: mpg is the outcome

beta0 is the intercept with am = 0

beta1 is the slope coefficent for the fitted line

am is the predictor

error is portion un-explained by the model

##                    Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)       17.147368   1.124603 15.247492 1.133983e-15
## I(as.factor(am))1  7.244939   1.764422  4.106127 2.850207e-04
##                   Estimate Std. Error  t value     Pr(>|t|)
## I(as.factor(am))0 17.14737   1.124603 15.24749 1.133983e-15
## I(as.factor(am))1 24.39231   1.359578 17.94109 1.376283e-17

The first set of coefficents is with the intercept included and second is without the intercept.So going from 0 to 1 i.e. from automatic to manual tranmission resulted in a 7 points increase in the mpg.

## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The above output(best fit model) suggests that the final model should take - am + qsec(1/4 mile time) + wtWeight (lb/1000) as the predictors for a complete model. Let us also check the residuals to see if the last model is a good fit or not.

So no clear pattern is observed in the two residual plots above which suggest that the model is not a miss-fit. It is also able to explain almost 85% of the variance in the data.

Conclusion

From the analysis done in the last section it is evident that the best model is one that takes am + qsec + wt as the predictor. And turning from automatic to manual transmission increases the mpg consumption by 2.9 when qsec and wt are also included in the model but their effect/contribution removed both from outcome(mpg) and predictor(am)