A car’s horse power and weight explains 83% of the variation in fuel efficiency (miles per gallon) based on the mtcars dataset.
There is no statistically significant indication that the transmission type effects the efficiency of the car.
This project analyses the mtcars dataset and addresses the following questions:
The mtcars dataset consists of 32 different types of car models with 11 different variables related to the car.
The basic structure of the data is as follows:
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Examining the relationships between the variables
pairs(~ mpg + disp + hp + wt + drat + qsec , data = mtcars)
Examining the correlation between the variables
Based on the high correlation between the dependant variables and limited number of samples there is a high risk of a specified regression model suffer from multicollinearity leading to invalid results for the individual predictors. However provided the basic regression assumptions hold the overall prediction power of the regression model is still valid.
The three variables cylinder number, displacement and horse power are highly correlated. Performing an F-test on two separate models one including all of the highly correlated variables and one without, with the hypothesis being:
H0: Beta(cyl = disp = drat) = 0 H1: Beta(cyl = disp = drat) != 0
modelA1 <- lm(mpg ~ hp + wt, data = mtcars)
modelA2 <- lm(mpg ~ as.factor(cyl) + disp + hp + drat + wt, data = mtcars)
anova(modelA1, modelA2)
## Analysis of Variance Table
##
## Model 1: mpg ~ hp + wt
## Model 2: mpg ~ as.factor(cyl) + disp + hp + drat + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 195.05
## 2 25 157.42 4 37.632 1.4941 0.2341
Based on the above with F = 1.4941 (Pr = 0.23) the null hypothesis that Beta(disp = cyl = drat) = 0 can not be rejected with any certainty. Therefore B(disp), B(cyl), B(drat) are insignificant and model A1 will be used.
Examining the Residuals
par(mfrow=c(2,2))
plot(modelA1)
As can be seen the residuals do not display any obvious heterscedasticity.
Examining modelA1
summary(modelA1)
##
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
As can be seen the model has a high R-squared meaning approximately 83% of the variation in mpg can be explained by the horse power and weight of the car.
Using model A1 in the section above as a base model and testing the hypothesis that Beta(am) = 0 at 5% significance level:
H0 : Beta(am) = 0
H1 : Beta(am) != 0
modelA1 <- lm(mpg ~ hp + wt, data = mtcars)
modelB1 <- lm(mpg ~ hp + wt + am, data = mtcars)
anova(modelB1, modelA1)
## Analysis of Variance Table
##
## Model 1: mpg ~ hp + wt + am
## Model 2: mpg ~ hp + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 180.29
## 2 29 195.05 -1 -14.757 2.2918 0.1413
The results from the F test indicate F = 2.2918 (p value = 0.14), therefore it is not possible to reject the null hypothesis at any significance level, thus there is no evidence that a car’s transmission effects the overall efficiency (mpg).