In this report we analyze the mtcars dataset for answering some key questions for “Motor Trend”. We are particularly interested in answering the following questions:
Let us first do some initial analysis of the dataset and try to identifiy some key features/trends(if any).
## Warning: package 'ggplot2' was built under R version 3.2.3
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here we are particularly interested in variables - mpg and am. Where mpg is “miles per gallon” and am “transmission” with value 1 for manual and 0 for automatic transmission.
Next, we subset the data we are interested in(mpg and am) and convert am to a factor variable
The plot above suggest that manual transmission cars tend to have a better fuel consumption than their automatic transmission counterparts.This answers our first question although we will be delve deeper.
Let us appy the t test(with 95% confidence interval or 5% type I error rate) to confirm our results. Here the null hypothesis is that there is no difference in MPG for manual and automatic transmission.
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Since the p-value is significant (<.05) we would reject the null hypothesis
Let us now try to fit a regression model and see the relationship between the predictor(transmission) and outcome(mpg). Out of the three regression models - Linear, Poisson and Binomial we can straight away eliminate Binomial as the outcome (mpg) is not binary although we can do binomial analysis by creating other variables with binary values based on some mpg threshold.For instance, getting a mpg greater than some value is a success(1) for auto transmission. Poisson model is used for modeling count data, rates or proportion.Though mpg is a rate but not specifially in the time domain(gallon is not a time parameter). So the best fit is linear model in this case and we can cross check this using the variance of the distribution exhibited by the data. The variance of the outcome(Yi) is constant (sigma ^2) in case of liner model and dependent on the mean(mui) in case of poisson model. The best fit can also be decided by looking at the residuals and if they exhibit any pattern. The more random and pattern-less the residual distribution the better fit the model is.
Linear regression equation: mpg = beta0 + beta1 * am + error
where: mpg is the outcome
beta0 is the intercept with am = 0
beta1 is the slope coefficent for the fitted line
am is the predictor
error is portion un-explained by the model
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## I(as.factor(am))1 7.244939 1.764422 4.106127 2.850207e-04
## Estimate Std. Error t value Pr(>|t|)
## I(as.factor(am))0 17.14737 1.124603 15.24749 1.133983e-15
## I(as.factor(am))1 24.39231 1.359578 17.94109 1.376283e-17
The first set of coefficents is with the intercept included and second is without the intercept.So going from 0 to 1 i.e. from automatic to manual tranmission resulted in a 7 points increase in the mpg.
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The above output(best fit model) suggests that the final model should take - am + qsec(1/4 mile time) + wtWeight (lb/1000) as the predictors for a complete model. Let us also check the residuals to see if the last model is a good fit or not.
So no clear pattern is observed in the two residual plots above which suggest that the model is not a miss-fit. It is also able to explain almost 85% of the variance in the data.
From the analysis done in the last section it is evident that the best model is one that takes am + qsec + wt as the predictor. And turning from automatic to manual transmission increases the mpg consumption by 2.9 when qsec and wt are also included in the model but their effect/contribution removed both from outcome(mpg) and predictor(am)