Catalog



Introduction


This data report studies the relationship between fuel consumption (outcome variable) and 10 aspects of automobile design and performance (predictors) for 32 automobiles in the data set named mtcars extracted from the 1974 Motor Trend US magazine, in order to answer two questions:
 
1. Is an automatic or manual transmission better for MPG?
2. What is the MPG difference between automatic and manual transmissions?
 
Linear regression model is used to explore these questions.


Data Manipulation


library(dplyr)
library(GGally)
library(ggplot2)

data(mtcars)
str(mtcars)
mydata <- mtcars %>% 
  mutate(transmission = ifelse(am == 1, "manual", "automatic")) %>% 
  mutate(cyl = as.factor(cyl), 
         vs = as.factor(vs), 
         gear = as.factor(gear), 
         carb = as.factor(carb), 
         transmission = as.factor(transmission) ) %>% 
  select(-am)  %>% 
  print
 

The mtcars data set contains 11 variables in total. In order for later analysis, 5 dummy variables (cyl, vs, gear, carb, am) are transformed as factors. Especially, to avoid confusion, the values of am are transformed literally (in mtcars the number “0” represents automatic transmission and the number “1” represents manual transmission). Processed data are store in data frame named mydata.


Exploratory Analysis


my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method = loess, fill = "red", color = "red", ...) +
    geom_smooth(method = lm, fill = "blue", color = "blue", ...)
  p
}
scatter.plot <- ggpairs(mtcars, lower = list(continuous = my_fn),
                        title = "Figure 1: Scatter Plot") +
  theme(plot.title = element_text(size = 18, face = "bold", color = "blue", 
                                  hjust = 0.5) ) +
  theme(plot.margin = unit(rep(0.2, 4), "in") )
scatter.plot
box.plot <- ggplot(mydata, aes(x = transmission, y = mpg, fill = transmission)) +
  geom_boxplot(width = 0.6) +
  ggtitle("Figure 2: Fuel Consumption Comparison") +
  labs(y = "Fuel Consumption (miles/gallon)", x = "Transmission",
       fill = "Transmission") +
  scale_y_continuous(breaks = seq(0, 35, by = 5) ) +
  theme(plot.title = element_text(size = 15, face = "bold", color = "blue", 
                                  hjust = 0.5, vjust = 2))  +
  theme(axis.title = element_text(size = 12) )  +
  theme(axis.text = element_text(size = 10) ) +
  theme(legend.title = element_text(size = 10),
        legend.text = element_text(size = 10 ) ) +
  theme(plot.margin = unit(rep(0.3, 4), "in") ) +
  coord_flip()
box.plot
 

All figures of this report can be found in the appendix.
 
Figure 1 show the pairwise relationship of variables in the data set. Since we are most concerned about the relationship between the fuel consumption and transmission types, we make the following box plot to compare the fuel consumption automatic and manual transmissions.
 
From Figure 2, it seems that cars with manual transmission consumes fuel more efficiently than those with automatic transmission.


Regression Model


1. Full Model

shapiro.test(mydata$mpg)$p.value  #normality is satisfied
fit <- lm(mpg ~ ., mydata)
 

Before fitting the linear model, the normality of the outcome is tested first. The p-value of Shapiro-Wilk test is 0.123, suggesting the normality of outcome variable, one of the assumptions for linear model, is satisfied.
 
Then A full model including all predictors is fitted. Obviously this model is not perfect, stepwise model selection with AIC criteria is conducted as follows. The best model will be the one with the smallest AIC value.


2. Model Selection

best <- step(fit);    summary(best)
AIC(best, fit);    confint(best)["transmissionmanual",] 
library(car);     vif(best)
The best model includes 4 variables, namely cyl, hp, wt and transmission. It can be written as:
\[mpg_i = 33.7 - 3.03 \times cyl6_i - 2.16 \times cy18_i - 0.032 \times hp_i - 2.49 \times wt_i + 1.81 \times transmissionmanual_i +\epsilon_i,\]

where \(\epsilon_i \sim N(0, 2.41^2)\) and \(Cov(\epsilon_i, \epsilon_j) = 0\) for any \(i \neq j.\)

The p-value of F statistic is 1.5e-10, showing a pretty good fit. The value of R-squared is 0.8659, which indicates 86.01% of the total variance in the fuel consumption efficiency (mpg) around its mean is accounted for by cyl, hp, wt and transmission. By model selection, AIC have been reduced from 169.22 in the full model to 154.47 in the best model.
 
The interpretation of the coefficient of transmissionmanual in best model is essential to answer our questions of interest. The coefficient value 1.81 indicates that when holding the remaining variables (cyl, hp, wt) constant, we can estimate a 1.81 mpg fuel consumption efficiency increase in cars with manual transmission compared to those with automatic transmission.

However, the p-value for the corresponding partial t-test is 0.21, implying that the differences of fuel consumption efficiency between manual and automatic transmission is not significant at alpha level of 0.05.

The 95% confidence interval for the coefficient of transmissionmanual is (-1.06, 4.68). Thus, we are 95% confident that the differences of fuel consumption efficiency between manual and automatic transmission is ranging from as low as -1.06 mpg to as high as 4.68 mpg, when the remaining variables (cyl, hp, wt) is held constant.


3. Model Diagnostics

library(ggfortify)
diagnostic.plot <- autoplot(best, label.size = 3, which = 1:6, ncol = 3) 
diagnostic.plot
 

Figure 3 includes the residual plot and QQ-plot. The residual plot have no obvious pattern: the expect value of residuals is close to 0 and the variances are approximately equal. The QQ-plot shows the standardized residuals are close to the theoretical quantiles. Thus, the residuals are normally distributed and homoscedastic, and the assumption of the liner model is satisfied.

The VIF values for each variable in the best model is small, indicating there are no obvious multicolinearity.


Executive Summary


Using linear regression model, we studied the relationship between fuel consumption and transmission system. From the best model, we can make the following conclusions:
 
1. The differences of fuel consumption efficiency between manual and automatic transmission cars are not significant at alpha level of 0.05.
 
2. We are 95% confident that the differences of fuel consumption efficiency between manual and automatic transmission cars are between as low as -1.06 mpg and as high as 4.68 mpg.


Appendix


Figure 1: Scatter Plot


Figure 2: Fuel Consumption Comparison


Figure 3: Model Diagnostics