Executive Summary

A car’s horse power and weight explains 83% of the variation in fuel efficiency (miles per gallon) based on the mtcars dataset.

There is no statistically significant indication that the transmission type effects the efficiency of the car.

Introduction

This project analyses the mtcars dataset and addresses the following questions:

  1. Is an automatic or manual transmission better for MPG
  2. Quantify the MPG difference between automatic and manual transmissions

The mtcars dataset consists of 32 different types of car models with 11 different variables related to the car.

Analysing the Data

The basic structure of the data is as follows:

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Examining the relationships between the variables

pairs(~ mpg + disp + hp + wt + drat + qsec , data = mtcars)

Examining the correlation between the variables

Determining the relationship between variables

Based on the high correlation between the dependant variables and limited number of samples there is a high risk of a specified regression model suffer from multicollinearity leading to invalid results for the individual predictors. However provided the basic regression assumptions hold the overall prediction power of the regression model is still valid.

The three variables cylinder number, displacement and horse power are highly correlated. Performing an F-test on two separate models one including all of the highly correlated variables and one without, with the hypothesis being:

H0: Beta(cyl = disp = drat) = 0 H1: Beta(cyl = disp = drat) != 0

modelA1 <- lm(mpg ~ hp + wt, data = mtcars)
modelA2 <- lm(mpg ~ as.factor(cyl) + disp + hp + drat + wt, data = mtcars) 
anova(modelA1, modelA2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ hp + wt
## Model 2: mpg ~ as.factor(cyl) + disp + hp + drat + wt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     29 195.05                           
## 2     25 157.42  4    37.632 1.4941 0.2341

Based on the above with F = 1.4941 (Pr = 0.23) the null hypothesis that Beta(disp = cyl = drat) = 0 can not be rejected with any certainty. Therefore B(disp), B(cyl), B(drat) are insignificant and model A1 will be used.

Performing further diagnostic tests on model A1

Examining the Residuals

par(mfrow=c(2,2))
plot(modelA1)

As can be seen the residuals do not display any obvious heterscedasticity.

Examining modelA1

summary(modelA1)
## 
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

As can be seen the model has a high R-squared meaning approximately 83% of the variation in mpg can be explained by the horse power and weight of the car.

Is an automatic or manual transmission better for MPG

Using model A1 in the section above as a base model and testing the hypothesis that Beta(am) = 0 at 5% significance level:

H0 : Beta(am) = 0

H1 : Beta(am) != 0

modelA1 <- lm(mpg ~ hp + wt, data = mtcars)
modelB1 <- lm(mpg ~ hp + wt + am, data = mtcars)
anova(modelB1, modelA1)
## Analysis of Variance Table
## 
## Model 1: mpg ~ hp + wt + am
## Model 2: mpg ~ hp + wt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 180.29                           
## 2     29 195.05 -1   -14.757 2.2918 0.1413

The results from the F test indicate F = 2.2918 (p value = 0.14), therefore it is not possible to reject the null hypothesis at any significance level, thus there is no evidence that a car’s transmission effects the overall efficiency (mpg).