Motor Trend, a magazine about the automobile industry, is interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions: “Is an automatic or manual transmission better for MPG”, and “Quantifying how different is the MPG between automatic and manual transmissions?”.
Using linear regression, and mtcars dataset, we found that on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that some other variables are held constant.
The mtcars dataset contains 32 observations and 10 predictors. The outcome is Miles Per Galon MPG. The pairs plot, Figure 1, shows the relationship between some predictors and the outcome. The first three observations of the mtcars dataset are shown below. Since the predictors cyl, vs, am, gear and carb are categorical, they were converted to factors. More Figures are shown in the appendix.
#Loading the mtcars dataset
data(mtcars)
data <- mtcars
#Displaying a summary of the dataset
head(data, n=3)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Converting categorical predictors to factors
data$cyl <- factor(data$cyl)
data$vs <- factor(data$vs)
data$am <- factor(data$am, labels = c("Automatic", "Manual"))
data$gear <- factor(data$gear)
data$carb <- factor(data$carb)
The forward model selection method was used. Different models were nested, and then, the anova was used to select the final model.
fit1 <- lm(mpg~am, data = data)
fit2 <- update(fit1, mpg~am+cyl, data = data)
fit3 <- update(fit1, mpg~am+cyl+hp, data = data)
fit4 <- update(fit1, mpg~am+cyl+hp+wt, data = data)
fit5 <- update(fit1, mpg~am+cyl+hp+wt+carb, data = data)
fit6 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear, data = data)
fit7 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs, data = data)
fit8 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs+qsec, data = data)
fit9 <- update(fit1, mpg~am+cyl+hp+wt+carb+gear+vs+qsec+disp, data = data)
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8, fit9)
The Coefficients of the slected model fit4<-lm(mpg~am+cyl+hp+wt, data = data) are shown below. This means on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that number of cylinders, hoursepower, and weight are held constant. On the hand, each of the remaining predictors has negative impact given that other predictors are held constant. The reltionship between the outcome and the four prdictors are show in Figure 1 (appendix).
fit4$coef
## (Intercept) amManual cyl6 cyl8 hp wt
## 33.70832 1.80921 -3.03134 -2.16368 -0.03211 -2.49683
The residual variance [summary(fit4)$sigma] = 2.4101.
PRESS residuals [resid(fit4)/(1-hatvalues(fit4))]= -1.8438, -1.0338, -4.4354, 2.7971, 1.5804, -0.6873, -0.5782, 0.8092, 0.0092, 1.3012, -0.4148, 0.8786, 0.9437, -1.2493, -1.9375, -0.9887, 5.2594, 5.185, 0.697, 5.7909, -4.0695, -2.9547, -3.6121, -0.9331, 3.2001, -1.4473, -1.3991, 2.777, -1.5095, -0.3264, 2.4903, -4.4
The residuals density and qqnorm plots show the almost normality of residuals. Figure 2 shows more residual plots, and Figure 3 - Leverage Plots (appendix).
co<-summary(fit4)$coefficients
par(mfrow=c(1,2))
#plot(fit4)
plot(density(resid(fit4)), main="Density of Residuals")
qqnorm(resid(fit4))
summary(fit4)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.9404 7.733e-13
## amManual 1.80921 1.39630 1.2957 2.065e-01
## cyl6 -3.03134 1.40728 -2.1540 4.068e-02
## cyl8 -2.16368 2.28425 -0.9472 3.523e-01
## hp -0.03211 0.01369 -2.3450 2.693e-02
## wt -2.49683 0.88559 -2.8194 9.081e-03
Confidence Intervals:Intercept=[28.3539, 39.0627], amManual=[-1.0609, 4.6794], cyl6=[-5.9241, -0.1386], cyl8=[-6.859, 2.5317], hp=[-0.0603, -0.004], wt=[-4.3172, -0.6765].
Our model shows that on average a car with manual transmission gives 1.80921138 more miles per gallon than a car with automatic transmission, given that number of cylinders, hoursepower, and weight are held constant. The uncertainty about this result id captured by the confidance interval [-1.0609, 4.6794].
par(mfrow = c(3,2),oma = c(3, 0, 0, 0))
plot(data$cyl, data$mpg, xlab ="Number of Cylinders", ylab = "Miles Per Gallon", main = "MPG vs Number of Cylinders")
plot(data$am, data$mpg, xlab ="Transmission Type", ylab = "Miles Per Gallon", main = "MPG vs Transmission Type")
plot(data$hp, data$mpg, xlab ="Gross Horsepower", ylab = "Miles Per Gallon",main = "MPG vs Gross Horsepower")
plot(data$wt, data$mpg, xlab ="Car's Weight (lb/1000)", ylab = "Miles Per Gallon",main = "MPG vs Car's Weight")
plot(data$disp, data$mpg, xlab ="Displacement (cu.in.)", ylab = "Miles Per Gallon",main = "MPG vs Displacemen")
plot(data$carb, data$mpg, xlab ="Number of carburetors", ylab = "Miles Per Gallon",main = "MPG vs Number of carburetors")
mtext("Figure 1 - Relationship between MPG and Predictors", side=1,outer = TRUE, cex = 1.2)
par(mfrow=c(2,2),oma = c(3, 0, 0, 0))
plot(fit4)
mtext("Figure 2 - More Residual Plots", side=1, outer = TRUE, cex = 1.2)
suppressWarnings(library(car))
par(mfrow=c(2,2),oma = c(3, 0, 0, 0))
leveragePlots(fit4, main="")
mtext("Figure 3 - Leverage Plots", side=1, outer = TRUE, cex = 1.2)