The Motor Trend Magazine is interested in exploring the relationship between a set of variables and miles per gallon (MPG). They are particularly interested in the following two questions:
Quantify the MPG difference between automatic and manual transmissions?
My analysis shows that:
When measuring MPG, manual transmissions provide an additional 1.48MPG of performance over automatic transmissions when taking into account three additonal explanatory variables (cylinders, horsepower & weight), these additional factors account for 85% of the explanation
library(datasets)
data(mtcars)
# View few samples of the dataset:
head(mtcars, 5)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Variables:
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#Statistical summary of mpg variable:
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.42 19.20 20.09 22.80 33.90
#Visualization ~ Automatic vs Manual Transmission:
library(ggplot2)
mtcars$am <- as.factor(mtcars$am)
transTyp <- ggplot(aes(x=am, y=mpg), data=mtcars) + geom_boxplot(aes(fill=am))
transTyp <- transTyp + labs(title = "Automatic vs Manual Transmission Boxplot")
transTyp <- transTyp + xlab("Transmission Type")
transTyp <- transTyp + ylab("MPG")
transTyp <- transTyp + labs(fill = "Legend (0=AT, 1=MT)")
transTyp
#Automatic vs Manual Transmission boxplot stats:
transStats = split(mtcars$mpg, mtcars$am)
#Mean:
sapply(transStats, mean)
## 0 1
## 17.14737 24.39231
#Stdev:
sapply(transStats, sd)
## 0 1
## 3.833966 6.166504
#Range:
sapply(transStats, range)
## 0 1
## [1,] 10.4 15.0
## [2,] 24.4 33.9
#Automatic vs Manual Transmission Hypothesis Test:
autoTrans <- mtcars[mtcars$am == "0",]
manTrans <- mtcars[mtcars$am == "1",]
t.test(autoTrans$mpg, manTrans$mpg)
##
## Welch Two Sample t-test
##
## data: autoTrans$mpg and manTrans$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
SYNOPSIS The boxplot above clearly indicates that manual transmissions provide better gas mileage than automatics. To test this claim, a hypothesis test is performed that rejects the null hypothesis, i.e., that the transmission type is in fact significantly correlated to gas mileage. Regression analyses will now be performed to quantify how much of a factor transmission type accounts for gas mileage.
Linear Regression Model
lrModel <- lm(mpg ~ am, data = mtcars)
summary(lrModel)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Multivariable Regression Model
mrModel <- lm(mpg~am + cyl + hp + wt, data = mtcars)
anova(lrModel, mrModel)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + hp + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.9
## 2 27 170.0 3 550.9 29.166 1.274e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mrModel)
##
## Call:
## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4765 -1.8471 -0.5544 1.2758 5.6608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.14654 3.10478 11.642 4.94e-12 ***
## am1 1.47805 1.44115 1.026 0.3142
## cyl -0.74516 0.58279 -1.279 0.2119
## hp -0.02495 0.01365 -1.828 0.0786 .
## wt -2.60648 0.91984 -2.834 0.0086 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared: 0.849, Adjusted R-squared: 0.8267
## F-statistic: 37.96 on 4 and 27 DF, p-value: 1.025e-10
SYNOPSIS A simple linear regression model is first conducted to find out how much of an affect transmission type actually has on gas mileage performance, which is our initial claim supported by our our preliminary exploratory data analysis. In this instance, transmission type, specifically manual transmissions, provide 7.25MPG (the am1 coefficient) better performance than automatic ones. However, based upon the R-squared value, trasmission types only explain 36% of the MPG performance, and thus this simple linear regression is not a very good model to answer Motor Trend’s questions with any definitiveness. A more logical approach would be to incorporate a multivariable regression model to take into account other variables that would most likely affect a vehicle’s gas mileage, e.g., number of cylinders, engine horsepower, vehicle weight, etc. Under this assumption, I therefore took these three variables from the dataset and ran a multivariable regression. This second model provided the following results: a 1.48MPG increase from manual transmissions over automatic ones with the additional variables (multivariable) model explaining 85% of the MPG performance.
#Scatterplot matrix of the dataset:
pairs(mpg ~ ., data = mtcars)
#Scatterplots of the multivariable regression model residuals:
par(mfrow = c(2,2))
plot(mrModel)