Executive Summary

Motor Trend, a magazine about the automobile industry, is interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions: “Is an automatic or manual transmission better for MPG”, and “Quantifying how different is the MPG between automatic and manual transmissions?”.
Using linear regression, and mtcars dataset, we found that on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that some other variables are held constant.

Exploratory Data Analyses

The mtcars dataset contains 32 observations and 10 predictors. The outcome is Miles Per Galon MPG. The pairs plot, Figure 1, shows the relationship between some predictors and the outcome. The first three observations of the mtcars dataset are shown below. Since the predictors cyl, vs, am, gear and carb are categorical, they were converted to factors. More Figures are shown in the appendix.

#Loading the mtcars dataset
data(mtcars)
data <- mtcars
#Displaying a summary of the dataset
head(data, n=3)
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Converting categorical predictors to factors
data$cyl <- factor(data$cyl)
data$vs <- factor(data$vs)
data$am <- factor(data$am, labels = c("Automatic", "Manual"))
data$gear <- factor(data$gear)
data$carb <- factor(data$carb)

Models Construction and Selection

The forward model selection method was used. Different models were nested, and then, the anova was used to select the final model.

fit1 <- lm(mpg~am, data = data)
fit2 <- update(fit1, mpg~am+cyl, data = data)
fit3 <- update(fit1, mpg~am+cyl+hp, data = data)
fit4 <- update(fit1, mpg~am+cyl+hp+wt, data = data)
fit5 <- update(fit1, mpg~am+cyl+hp+wt+carb, data = data)
fit6 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear, data = data)
fit7 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs, data = data)
fit8 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs+qsec, data = data)
fit9 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs+qsec+disp, data = data)
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8, fit9)

Interpretation of the Selected Model Coefficients

The Coefficients of the slected model fit4<-lm(mpg~am+cyl+hp+wt, data = data) are shown below. This means on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that number of cylinders, hoursepower, and weight are held constant. On the hand, each of the remaining predictors has negative impact given that other predictors are held constant. The reltionship between the outcome and the four prdictors are show in Figure 1 (appendix).

fit4$coef
## (Intercept)    amManual        cyl6        cyl8          hp          wt 
##    33.70832     1.80921    -3.03134    -2.16368    -0.03211    -2.49683

Residual Plot and Diagnostics

The residual variance [summary(fit4)$sigma] = 2.4101.
PRESS residuals [resid(fit4)/(1-hatvalues(fit4))]= -1.8438, -1.0338, -4.4354, 2.7971, 1.5804, -0.6873, -0.5782, 0.8092, 0.0092, 1.3012, -0.4148, 0.8786, 0.9437, -1.2493, -1.9375, -0.9887, 5.2594, 5.185, 0.697, 5.7909, -4.0695, -2.9547, -3.6121, -0.9331, 3.2001, -1.4473, -1.3991, 2.777, -1.5095, -0.3264, 2.4903, -4.4
The residuals density and qqnorm plots show the almost normality of residuals. Figure 2 shows more residual plots, and Figure 3 - Leverage Plots (appendix).

co<-summary(fit4)$coefficients
par(mfrow=c(1,2))
#plot(fit4)
plot(density(resid(fit4)), main="Density of Residuals")
qqnorm(resid(fit4))

plot of chunk unnamed-chunk-4

Quantifying the Uncertainty

summary(fit4)$coefficients
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) 33.70832    2.60489 12.9404 7.733e-13
## amManual     1.80921    1.39630  1.2957 2.065e-01
## cyl6        -3.03134    1.40728 -2.1540 4.068e-02
## cyl8        -2.16368    2.28425 -0.9472 3.523e-01
## hp          -0.03211    0.01369 -2.3450 2.693e-02
## wt          -2.49683    0.88559 -2.8194 9.081e-03

Confidence Intervals:
Intercept=[28.3539, 39.0627], amManual=[-1.0609, 4.6794], cyl6=[-5.9241, -0.1386], cyl8=[-6.859, 2.5317], hp=[-0.0603, -0.004], wt=[-4.3172, -0.6765].

Conclusion

Our model shows that on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that number of cylinders, hoursepower, and weight are held constant. The uncertainty about this result id captured by the confidance interval [-1.0609, 4.6794].

      par(mfrow = c(3,2),oma = c(3, 0, 0, 0))
      plot(data$cyl, data$mpg, xlab ="Number of Cylinders", ylab = "Miles Per Gallon", main = "MPG vs Number of Cylinders")
      plot(data$am, data$mpg, xlab ="Transmission Type", ylab = "Miles Per Gallon", main = "MPG vs Transmission Type")
      plot(data$hp, data$mpg, xlab ="Gross Horsepower", ylab = "Miles Per Gallon",main = "MPG vs Gross Horsepower")
      plot(data$wt, data$mpg, xlab ="Car's Weight (lb/1000)", ylab = "Miles Per Gallon",main = "MPG vs Car's Weight")
      plot(data$disp, data$mpg, xlab ="Displacement (cu.in.)", ylab = "Miles Per Gallon",main = "MPG vs Displacemen")
      plot(data$carb, data$mpg, xlab ="Number of carburetors", ylab = "Miles Per Gallon",main = "MPG vs Number of carburetors")
      mtext("Figure 1 - Relationship between MPG and Predictors", side=1,outer = TRUE, cex = 1.2)

plot of chunk unnamed-chunk-6

par(mfrow=c(2,2),oma = c(3, 0, 0, 0))
plot(fit4)
mtext("Figure 2 - More Residual Plots", side=1, outer = TRUE, cex = 1.2)

plot of chunk unnamed-chunk-7

suppressWarnings(library(car))
par(mfrow=c(2,2),oma = c(3, 0, 0, 0))
leveragePlots(fit4, main="")
mtext("Figure 3 - Leverage Plots", side=1, outer = TRUE, cex = 1.2)

plot of chunk unnamed-chunk-8