Motor Trend magasine, would like to explore the relationship between the miles per galon (MPG) and the automatic vs manual transmission. Is this analytical project will use Regression Models and analyse the famous “mtcars” dataset from the same magazine in 1974 to answer the two questions:
The hypothesis testing and also the simple linear regression, helped us determine that there’s a signficant difference between the mean MPG for automatic and manual transmission cars, with the latter having 7.245 more MPGs on average.So in order to adjust for other confounding variables like the weight and the horsepower of the cars, we use multivariate regression to get a better estimate of the impact from hte transmission type on MPG. After validating the model, the results from the multivariate regression reveal that, on average, manual transmission cars get 2.084 miles per gallon more than automatic transmission cars. This is due mostly because of the larger control over the shifting and the RPM’s.
See Figure 1 and 2
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
## [1] 0.001373638
## [1] -11.280194 -3.209684
Apart from the transmission type highly corelated with mpg are also: cylinders (cyl), displacement (disp), horsepower (hp), and weight (wt). See Figure 3
And yet again, there are no apparent outlier in our dataset. However, we can easily see a difference in the MPG by transmission type. As suspected, manual transmission seems to get better miles per gallon than automatic transmission. Still, we should dig deeper and build a linear model,A linear model assumes lack of multicollinearity in the predictors, homoscedacity, and iid, approximately normally distributed errors.
The mean MPG of manual transmission cars is 7.245 MPGs higher than that of automatic transmission cars.How big of a diference is that.Setting our alpha-value at 0.5 and run a t-test to find out.
##
## Welch Two Sample t-test
##
## data: autoData$mpg and manualData$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
Correlation
## wt cyl disp hp carb qsec
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251 0.4186840
## gear am vs drat mpg
## 0.4802848 0.5998324 0.6640389 0.6811719 1.0000000
In addition to am (transmission), default included in our regression model, we see that wt, cyl, disp, and hp are highly correlated with our dependent variable mpg.Visualised on Figure 3.For that reason we will include them in our model. However, if we look at the correlation matrix, we also see that cyl and disp are highly correlated with each other. Since predictors should not exhibit collinearity, we should not have cyl and disp in in our model.
Including wt and hp in our regression equation makes sense intuitively - heavier cars and cars that have more horsepower should have lower MPGs.
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
This does not gain much more information from our hypothesis test. Interpreting the coefficient and intercepts, we say that, on average, automatic cars have 17.147 MPG and manual transmission cars have 7.245 MPG more. But!
Next, we fit a multivariate linear regression for mpg on am, wt, and hp. Since we have two models of the same data, we run an ANOVA to compare the two models and see if they are significantly different.
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 180.29 2 540.61 41.979 3.745e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a p-value of 3.745e-09, we reject the null hypothesis and claim that our multivariate model is significantly different from our simple model. For the report details of our model, it is important to check the residuals for any signs of non-normality and examine the residuals vs. fitted values plot. See Figure 5
##
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## am 2.083710 1.376420 1.514 0.141268
## wt -2.878575 0.904971 -3.181 0.003574 **
## hp -0.037479 0.009605 -3.902 0.000546 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
Conclusion
This model explains over 83.99% of the variance. Moreover, we see that wt and hp did indeed confound the relationship between am and mpg (mostly wt). Now when we read the coefficient for am, we can see that on average, manual transmission cars have 2.084 MPGs more than automatic transmission cars. That’s a huge difference from the initial 7.245 mpg’s. The p-value for the estimate of ß = 2.908 suggests there is not enough evidence to reject the null hypothesis and would not observe a difference this large by a random chance.
## Figure 2