Which is better? Manual or automatic? How do they work? Which one to choose? Why? These are all are always asked by anyone who wishes to buy a car. It is a general myth that manual transmission (MT)cars use less fuel. It gives more control over the car but are less convenient, while automatic transmissions (AT) burn (a little) more gas, provide less control but they are easier to use.
Looking at “mtcars”data set, we are interested in exploring the relationship between a set of variables and Miles Per Gallon (MPG).
We are particularly interested in the following two questions:
1.Is an automatic or manual transmission better for MPG?
2. Quantify the MPG difference between automatic and manual transmissions?
Our Conclusion is summarized here below
1.It is arrived after extensive analysis that Manual Transmission is better for MPG.
2 Our model explains 84% of the variance in Miles Per Gallon (MPG). It also shows that Manual Transmission vehicles have 2.94 MPG more than Automatic Transmission vehicles.
We will explore the dataset and try to answer above questions using exploratory data analysis and regression models.
library(knitr)
library(ggplot2)
dim(mtcars)
## [1] 32 11
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The numeric variable ‘am’ classifies if the car is with Automatic Trasmission or Manual Trasmission.
We will convert the vriable as AT and MT for easy classification.
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("AT", "MT")
hist(mtcars$mpg,breaks=12, col="green", xlab="Miles Per Gallon", main="MPG Histogram")
A boxplot was created to examine the relationship between mpg and transmission type.
boxplot(mpg ~ am, data=mtcars, xlab="Transmission Type", ylab="Miles per Gallon",
main="Automatic versus Manual Transmission MPG", col="yellow")
A t-test was done to get the exact values and confidence interval for fuel consumption between the automatic transmission and manual transmission vehicles. We set the null hypothesis as automatic transmissions have a high mpg compared with manual transmission vehicles.
mpg.at <- mtcars[mtcars$am == "AT",]$mpg
mpg.mt <- mtcars[mtcars$am == "MT",]$mpg
t.test(mpg.at, mpg.mt)
##
## Welch Two Sample t-test
##
## data: mpg.at and mpg.mt
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
The p-value is 0.001374, thus we can reject the null hypothesis and conclude that automatic transmission vehicles has low mpg compared with manual transmission vehicles. This would be true assuming all other characteristics of auto cars and manual cars are same.
We do a simple gression model to analyse further
fit <- lm(mpg~am, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amMT 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The null hypothesis is rejected by p-value = 0.000285 but the R Squared value is 0.3598.This means that our model only explains 35.98% of the variance. We need to include other factors.
A multivarate regression model was done.The step function was used to find the best model.
stepmodel = step(lm(data = mtcars, mpg ~ .),trace=0,steps=10000)
summary(stepmodel)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amMT 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The R Squared value is 0.85.This means that our model explains 84% of the variation in mpg indicating it is a robust and highly predictive model. In adddition to transmission, weight of the vehicle as well as accelaration speed have the highest relation to explaining the variation in mpg.
A model with 3 variables wt, qsec and am was done.
bestfit <- lm(mpg~am + wt + qsec, data = mtcars)
anova(fit, bestfit)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This model captured 84% of the overall variation in mpg. With a p-value of 3.745e-09, we reject the null hypothesis and claim that our multivariate model is significantly different from our simple linear regression model.
summary(bestfit)
##
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## amMT 2.9358 1.4109 2.081 0.046716 *
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
This model explains 84% of the variance in miles per gallon (mpg). It also shows that manual transmission vehicles have 2.94 mpg more than automatic transmission vehicles.
Thus, we can conclude that manual transmission is better for mpg.