Executive Summary

This report analyze the relationship between transmission type (manual or automatic) and miles per gallon (MPG). The report determines which transmission type produces a higher MPG. The mtcars dataset is used for this analysis. Although simple t-test between automatic and manual transmission vehicles shows that manual transmission vehicles have a 7.245 greater MPG than automatic transmission vehicles, but with linear regression models, analysis showed that the manual transmission less significant to MPG, only an improvement of 1.81 MPG. Parameters weight, cylinders, and horsepower contributed more significantly to the overall MPG of vehicles.

The analysis of this data is composed of three parts:

  1. Exploratory Data Analysis, in which the data is loaded, compiled, and subject to an initial graphical examination;
  2. Regression Analysis, in which a linear model is fit to the data. This part also contains discussions on model selection, validation (by residual analysis), and interpretation of the relevant regression coefficients;
  3. Appendix, wherein the plots that are used to support the discussion throughout this report are presented.

Exploratory Data Analysis

Load data set mtcars and change some variables from numeric class to factor class.

library(ggplot2)
data(mtcars)
mtcars[1:3, ] # Sample Data
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
dim(mtcars)
## [1] 32 11
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
## 
##     mpg

Refer to the Appendix: Figures box plot and pair graph.

Box plot indicates that manual transmission yields higher values of MPG in general.
In the pair graph indicates higher correlations between variables including weight, cyclender, horsepower etc., suggesting that the MPG is more associated to other factors.

Inference

Let us assume that both automatic manual are from the same population following normal distribution (i.e. NULL hypothesis) and validate it. This can be validated using sample T-test as follows:

result <- t.test(mpg ~ am)
result$p.value
## [1] 0.001373638
result$estimate
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Since the p-value = 0.00137, we reject our null hypothesis. The automatic and manual transmissions are from different populations. The mean for MPG of manual transmitted cars is about 7 more than that of automatic transmitted cars.

Regression Analysis

Build the full model as follows:

fullModel <- lm(mpg ~ ., data=mtcars)
summary(fullModel) # results hidden

This results are:
Residual standard error as 2.833 on 15 degrees of freedom.
The Adjusted R-squared value is 0.779. In other words,that the model can explain about 78% of the variance of the MPG variable. However, none of the coefficients are significant at 0.05 significant level.

Determine the statistically significant variables as follows:

stepModel <- step(fullModel, k=log(nrow(mtcars)))
summary(stepModel) # results hidden

This model is “mpg ~ wt + qsec + am”. It has the Residual standard error as 2.459 on 28 degrees of freedom. And the Adjusted R-squared value is 0.8336, which means that the model can explain about 83% of the variance of the MPG variable. All of the coefficients are significant at 0.05 significant level.

Please refer to the Appendix: Figures section for the plots again. According to the scatter plot, it indicates that there appear to be an interaction term between “wt” variable and “am” variable, since automatic cars tend to weigh heavier than manual cars. Thus, we have the following model including the interaction term:

amIntWtModel<-lm(mpg ~ wt + qsec + am + wt:am, data=mtcars)
summary(amIntWtModel) # results hidden

This model has the Residual standard error as 2.084 on 27 degrees of freedom. The Adjusted R-squared value is 0.8804, which indicates that the model can explain about 88% of the variance of the MPG variable. All of the coefficients are significant at 0.05 significant level.

As part of the next analysis, fit the simple model with MPG as the outcome variable and Transmission as the predictor variable.

amModel<-lm(mpg ~ am, data=mtcars)
summary(amModel) # results hidden

It shows that on average, a car has 17.147 mpg with automatic transmission, and if it is manual transmission, increases by 7.245 mpg. This model has the Residual standard error of 4.902 on 30 degrees of freedom.

The Adjusted R-squared value is 0.3385, which means that the model can explain about 34% of the variance of the MPG variable. The low Adjusted R-squared value indicates that additional variables have to added to the model.

Reconstructing the model as:

anova(amModel, stepModel, fullModel, amIntWtModel) 
confint(amIntWtModel) # results hidden

Select the model with the highest Adjusted R-squared value, “mpg ~ wt + qsec + am + wt:am”.

summary(amIntWtModel)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.723053  5.8990407  1.648243 0.1108925394
## wt          -2.936531  0.6660253 -4.409038 0.0001488947
## qsec         1.016974  0.2520152  4.035366 0.0004030165
## am1         14.079428  3.4352512  4.098515 0.0003408693
## wt:am1      -4.141376  1.1968119 -3.460340 0.0018085763

Thus, the result shows that when weight lb/1000 and “qsec” (1/4 mile time) remain unchanged, cars with manual transmission add 14.079 + (-4.141)*wt more MPG (miles per gallon) on average than cars with automatic transmission. That is, a manual transmitted car that weighs 2000 lbs have 5.797 more MPG than an automatic transmitted car that has both the same weight and 1/4 mile time.

Residual Analysis and Diagnostics

Appendix: Figures section has the following plots. Based on the residual plots, following assumptions can be verified:

  1. The Residuals vs. Fitted plot shows no pattern, supporting the accuracy of the independence assumption.
  2. The Normal Q-Q plot indicates that the residuals are normally distributed because the points lie closely to the line.
  3. The Scale-Location plot confirms the constant variance assumption, as the points are randomly distributed.
  4. The Residuals vs. Leverage argues that no outliers are present, as all values fall well within the 0.5 bands.

Following results are observed for the Betas.

sum((abs(dfbetas(amIntWtModel)))>1)
## [1] 0

Conclusion

Dataset is very limited with 32 entries. Additional data might provide better inference. However with the given data:

We have determined that there is a difference between the transmission type - Manual vs. Automatic on the mpg and quantified the difference.

Transmission type alone does not seem to explain the variation mpg The other factors including horsepower, weight, qsec and cylinders are more significant variables.

Appendix: Figures

Figure 1. Histogram, Kernel Density Plot, Boxplot of MPG vs. Transmission relationship

par(mfrow = c(1, 2))
# Histogram with Normal Curve
x <- mtcars$mpg
h<-hist(x, breaks=10, col="Green", xlab="Miles Per Gallon",
   main="Histogram of Miles per Gallon")
xfit<-seq(min(x),max(x),length=40)
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
# Kernel Density Plot
d <- density(mtcars$mpg)
plot(d, xlab = "MPG", main ="Density Plot of MPG")

d <- density(mtcars$mpg) plot(d, xlab = “MPG”, main =“Density Plot of MPG”)

mtcars$am <- as.factor(mtcars$am)
transTyp <- ggplot(aes(x=am, y=mpg), data=mtcars) + geom_boxplot(aes(fill=am))
transTyp <- transTyp + labs(title = "Automatic vs Manual Transmission Boxplot")
transTyp <- transTyp + xlab("Transmission Type")
transTyp <- transTyp + ylab("MPG")
transTyp <- transTyp + labs(fill = "Legend (0=AT, 1=MT)")
transTyp

Figure 2. Scatterplots produced by plotting each variable against all others

pairs(mtcars,panel=panel.smooth, pch=16, cex=0.5, gap=0.25, lwd=2, las=1, cex.axis=0.7)

Figure 3. Scatter Plot of MPG vs. Weight by Transmission

ggplot(mtcars, aes(x=wt, y=mpg, group=am, color=am, height=3, width=3)) + geom_point() +  
scale_colour_discrete(labels=c("Automatic", "Manual")) + 
xlab("weight") + ggtitle("Scatter Plot of MPG vs. Weight by Transmission")

Figure 4. Residual Plots

par(mfrow = c(2, 2))
plot(amIntWtModel)