The relentless tussle between the transmission technologies - the manual and automatic transmissions continues till this day, for example the following articles capture the debate to the hilt :-
The following is an attempt to analyse the mtcars data that was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
The analysis uses regression models & exploratory data analyses and aims at addressing the following two crux questions :
mtcars dataset is used for the analysis.
Factoring out some variables:
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
Loading thr libraries:
library(plyr)
library(ggplot2)
library(stats)
library(car)
library(graphics)
The following figure displays the relation between the miles :
transmission <- revalue(mtcars$am, c('0'="automatic", '1'="manual"))
ggplot(mtcars, aes(x=transmission, y=mpg, fill=transmission)) +
geom_boxplot() +
xlab("Transmission type") +
ylab("Miles per gallon")
The plot above clearly emblazons the difference on fuel consumption between manual and automatic transmission technologies.Further we perform the following regression modelling to explain the variability of MPG with type of transmission technology solely.
fit1 <- lm(mpg ~ am, data=mtcars)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## am1 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
As we look on the summary above, we can see that although coefficients for both intercept and the transmission type are significant, the model fit using only transmission type explains only 35.98% of the MPG variation.
Before making any conclusions on the effect of transmission type on fuel efficiency, we look at the variances between several variables in the dataset.
pairs(mtcars, panel=function(x,y) {
points(x, y)
abline(lm(y ~ x), col="red")
})
Based on the pairs plot above, several variables seem to have high correlation with the mpg variable. Hence, we build an initial model using all variables and select the model with the best subset of predictors using stepwise backward elimination and forward selection.
initial_model <- lm(mpg ~ ., data=mtcars)
best_model <- step(initial_model, direction="both", trace=0)
summary(best_model)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.939 -1.256 -0.401 1.125 5.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.7083 2.6049 12.94 7.7e-13 ***
## cyl6 -3.0313 1.4073 -2.15 0.0407 *
## cyl8 -2.1637 2.2843 -0.95 0.3523
## hp -0.0321 0.0137 -2.35 0.0269 *
## wt -2.4968 0.8856 -2.82 0.0091 **
## am1 1.8092 1.3963 1.30 0.2065
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.866, Adjusted R-squared: 0.84
## F-statistic: 33.6 on 5 and 26 DF, p-value: 1.51e-10
par(mfrow = c(2,2))
plot(best_model)
The final model contains four predictors, cyl (number of cylinders), hp (horsepower), weight (weight) and am (transmission type). This model explains the 86.58% of the MPG variation. The number of cylinders, weight and horsepower significantly contribute to the accuracy of the model while the transmission has no effect on the fuel consumption (alpha=0.05). Also the residual plots show that the distribution of residuals seem to be normally distributed and not depending on fitted values.
The data analysis on mtcars dataset from 1973 reveals some interesting points.
The mtcars dataset used for this analysis comprises data for 1973-1974 models. This analysis was not able to find any significant link between the transmission type and fuel consumption. For modern cars, with much more efficient automatic transmission system, it is less likely that having a stick shift car will save you any money.