This report is the concluding course project within the Regression models course from the Data Science Spezialization by John Hopkins University on Coursera.org.
The relationship between cars’s transmission (manual or automatic) and the fuel consumption (miles per gallon) of different car types listed in the mtcars dataset is examined, considering the eventual effects of other variables on fuel consumption, as well.
You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following questions:
The data was extracted from the 1974 Motor Trend US magazine. It comprises the following aspects of automibile design and performance for 32 automobile models.
| Variable Name | Class | Range | Description |
|---|---|---|---|
| mpg | numeric | 10.4 - 33.9 | Fuel Consumption in Miles per Gallon |
| cyl | numeric | 4,6,8 | Number of Cylinders |
| disp | numeric | 71.1 - 472 | Displacement cubic inches |
| hp | numeric | 52 - 335 | Gross Horsepower |
| drat | numeric | 2.76 - 4.93 | Rear Axle Ratio |
| wt | numeric | 1.513 - 5.424 | Weight in 1000 lbs |
| qsec | numeric | 14.5 - 22.9 | 1/4 Mile Time in Seconds |
| vs | numeric | 0,1 | Engine Type: 0 = V-shaped, 1 = straight |
| am | numeric | 0,1 | Transmission: 0 = automatic, 1 = manual |
| gear | numeric | 3,4,5 | Number of Gears |
| carb | numeric | 1 - 8 | Number if Carburators |
In order to analyze the data effectively, the raw data was processed to a tidy form. Processing steps were (Code in Apendix A: Data Processing):
vs and am into factor variablesThe tidy dataset has the following form:
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
When comparing manual to automatic transmission, without taking any other factor into account, it appears as if there is a better fuel consumption in automatic cars:
When testing for the relationship of transmission type am and fuel consumption mpg alone, there is a significant difference of effects of automatic and manual transmission of fuel consumption (p < 0.001). The fuel consumption for manual cars is approx. 7 miles per gallon higher (95% CI: [3.64151, 10.84837]) than for automatic cars.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## amManual 7.244939 1.764422 4.106127 2.850207e-04
Anyhow, this simple model does not take into account any of the other variables, which may explain away the difference we just observed. When including all variable in the dataset, for example, the difference between automatic and manual transmission disappears.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs1 0.31776281 2.10450861 0.1509915 0.88142347
## amManual 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
In order to decide which variables to include in a comprehensive model, the effects of multiple variables on fuel consumption, as well as the effects that these variables eventually have on each other in a collinear way, have been examined in the following section.
The model selection process was organized in these consecutive steps (Recruit Appendix B: Model Selection for more detailed information):
Aikaike’s Information Criterion (AIC) was examined with the step() function applied to the full model in order to determine the variables, that produce the best AIC value, i.e. that form the best model. The resulting model includes the variables wt. qsec and am, i.e. the weight, acceleration and transmission type of a car.
In order to verify, that no other variable adds additional information to the model including these 3 variables, an ANOVA was applied comparing all nesting models to their nested models. No other variables seem to add any additional information to the model identified in step 1.
To rule out collinearity of wt, qsec and am, variance inflation factors (VIF) were checked. With VIF values between 1.36 to 2.54, the variables seem to be independent.
To ensure that all other assumotions are met (Linearity, Homoscedasticity and Normality), residual plots were examined (Appendix C: Residual Plots). All assumptions seem to be met.
That leaves us with the following model:
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amManual 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
## 2.5 % 97.5 %
## (Intercept) -4.63829946 23.873860
## wt -5.37333423 -2.459673
## qsec 0.63457320 1.817199
## amManual 0.04573031 5.825944
The model selected in the Model Selection section explains about 83% of variance in the data (Adjusted R^2). That value is highly significant (p < 0.001) and indicates that an appropriate model has been found. It further shows that wt, qsec and am each have an effect on mpg, which is either highly significent (wt & qsec: p < 0.001) or significant (am: p < 0.05):
wt has a decreasing effect on mpg. For every ton weight increase, the fuel consumption decreases by 2.5 to 5.37 tons with a certainty of 95%.
qsec has an increasing effect on mpg. The slower the car acceleration, the higher the fuel consumption: For every additional second that a car needs for 1/4 mile, the fuel consumption increases by 0.63 to 1.82 miles per gallon with a certainty of 95%.
The effect of the transmission type is also detectable and leads to following answers to the initial questions:
Is an automatic or manual transmission better for MPG?
An automatic transmission has a lower fuel consumption than a car with manual transmission.
Quantify the MPG difference between automatic and manual transmission.
The difference in fuel consumption in Miles per Gallon for automatic and manual transmission lays between 0.05 and 5.83 Miles per Gallon with a certainty of 95%.
step(lm(mpg~., data = mtcars), direction = "both", trace = 0)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Coefficients:
## (Intercept) wt qsec amManual
## 9.618 -3.917 1.226 2.936
fit1 <- lm(mpg ~ wt + qsec + am, mtcars)
fit2 <- lm(mpg ~ wt + qsec + am + cyl, mtcars)
fit3 <- lm(mpg ~ wt + qsec + am + cyl + disp, mtcars)
fit4 <- lm(mpg ~ wt + qsec + am + cyl + disp + hp, mtcars)
fit5 <- lm(mpg ~ wt + qsec + am + cyl + disp + hp + drat, mtcars)
fit6 <- lm(mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs, mtcars)
fit7 <- lm(mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs + gear, mtcars)
fit8 <- lm(mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs + gear + carb, mtcars)
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt + qsec + am
## Model 2: mpg ~ wt + qsec + am + cyl
## Model 3: mpg ~ wt + qsec + am + cyl + disp
## Model 4: mpg ~ wt + qsec + am + cyl + disp + hp
## Model 5: mpg ~ wt + qsec + am + cyl + disp + hp + drat
## Model 6: mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs
## Model 7: mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs + gear
## Model 8: mpg ~ wt + qsec + am + cyl + disp + hp + drat + vs + gear + carb
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169.29
## 2 27 167.78 1 1.5011 0.2137 0.6486
## 3 26 161.41 1 6.3709 0.9071 0.3517
## 4 25 150.99 1 10.4228 1.4840 0.2367
## 5 24 149.09 1 1.9013 0.2707 0.6083
## 6 23 148.87 1 0.2170 0.0309 0.8621
## 7 22 147.90 1 0.9717 0.1384 0.7137
## 8 21 147.49 1 0.4067 0.0579 0.8122
## wt qsec am
## 2.482952 1.364339 2.541437