The present report is exploring the relationship between a set of variables and miles per gallon (MPG). The particular interest is the impact of type of transmission on fuel usage. The analysis indicates clearly the higher MPG for cars with manual transmission.
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).Report is based on mtcars data set with 32 observations on 11 (numeric) variables.
mpg - Miles/(US) gallon, 2) cyl - Number of cylinders,
disp - Displacement (cu.in.), 4) hp - Gross horsepower,
drat - Rear axle ratio, 6) wt - Weight (1000 lbs),
qsec - 1/4 mile time, 8) vs - Engine (0 = V-shaped, 1 = straight),
am - Transmission (0 = automatic, 1 = manual), 10) gear - Number of forward gears,
carb - Number of carburetors
## data extraction and modification
library(datasets)
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl) ## factor variable
mtcars$gear <- factor(mtcars$gear) ## factor variable
mtcars$carb <- factor(mtcars$carb) ## factor variable
Analysis of data demonstrates that manual transmission type have higher MPG compared to automatic one.
boxplot(mpg~am, data = mtcars,
xlab = "Transmission type: 0 - automatic, 1 - manual",
ylab = "Miles per Gallon (MPG)",
main = "MPG by Transmission Type")
The observed difference is investigated further in t test (see details in Appendix).
tt <- t.test(mtcars$mpg~mtcars$am)
The p-value (0.0013736) is well below 0.05, confidence interval is negative (-11.3, -3.2) that means that we can reject hypothesis “no difference between automatic and manual transmission” and conclude confidently that manual transmission results in higher MPG compared to automatic one (assuming that all other conditions stay same).
Average MPG: automatic - 17.1, manual - 24.4.
As mentioned above the average MPG is substantially higher for manual transmission (24.4) compared to automatic one (17.1).
To investigate the MPG difference in more details we should develop the model to identify and explain data correlation. The linear regression looks like the appropriate model for the current data set.
To determine the most efficient model we should identify the right set of variables. It was done using the stepwise regression method with AIC (Akaike information criterion) minimization.
fit_all <- lm(mpg ~ ., data = mtcars) ## model with all variables
fit_best <- step(fit_all , direction = "both")
The analysis of variables demonstrate that 4 most important variable are cyl, hp, wt and am. There is obvious negative correlation of MPG with number of cylinders, horse power and weight. There is positive correlation of MPG with manual transmission.
coef(fit_best)
## (Intercept) cyl6 cyl8 hp wt am
## 33.70832390 -3.03134449 -2.16367532 -0.03210943 -2.49682942 1.80921138
The R-squared for the 4 variable model is 0.87 vs. 0.89 for the full model.
Details of model analysis are presented in Appendix. Residuals are demonstrating “healthy” behavior that supports the model choice.
It can be concluded that MPG for manual transmission is higher compared to automatic one. High probability of this statement is confirmed by p-value and confidence interval. Average MPG value for manual transmission is higher, the correlation for manual transmission is positive.
However, there could be other variables that may impact MPG. There could be some natural bias in data set as well when different car types tend to have different transmission types.
t.test(mtcars$mpg~mtcars$am)
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
par(mfrow = c(2,2))
plot(fit_best)
mtext("Analysis of Residuals", side = 3, line = -2, outer = TRUE)
pairs(mpg ~ .,
data = mtcars,
main="Relations of variables")