Introduction

In this brief report we try to give an answer to an essential question for every car enthustiast. Is an automatic or manual transmission better for MPG ? And how can we quantify the difference? This is the kind of question that regression analysis can answer, and we use the standard R dataset mtcars to show it.

Necessary R package loading

First we load some R package. Dataset to obtain mtcars, ggplot2 and ggally to obtain pair-wise correlation and boxplot necessary for quick data exploration.

suppressMessages(library(datasets)) # to load mtcars
suppressMessages(library(ggplot2)) # to alow GGally to work and plot 
suppressMessages(library(GGally)) # for pair-wise correlation and boxplot
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Basic Data exploration

In this section we use first the ggpairs function of R-Package GGally to have a quick glance at pair-wise correlation between all the variables. The aim is to further avoid hidden links between mpg and the transmission mode that could lead to false conclusions (covariates)

# Function to obtain nice plots with ggpairs
my_fn <- function(data, mapping, method="loess", ...){
      p <- ggplot(data = data, mapping = mapping) + 
      geom_point() + 
      geom_smooth(method=method, ...)
      p
    }

# Default loess curve    
ggpairs(mtcars, lower = list(continuous = my_fn))

What we see here, with categorical variables considerer first as continuous, the very high level of inter-correlation between variables. For example if “am”- of value one if transmission is manual and zero if transmission is automatic- is correlated to “mpg” with R=0.6, the absolute value of correlation with “mpg” is greater for 6 predictors: “cyl” (0.85), “disp” (0.84), “hp” (0.77),“drat” (0.68), “wt” (0.86) and “ws” (0.66). The best results, the most significative results concerning the link between “mpg” and “am” then should be adjusted for these predictors to avoid false conclusions.

Before continuing the analysis we transform the continuous am variable to categorical, using factor.

mtcars$am <- factor(mtcars$am,labels=c('Automatic','Manual'))

With other variables ignored, manual transmission is better for mpg

We continue here our investigation in order to answer the questions. First in a mono-variable scheme.

Is an automatic or manual transmission better for MPG ?

The t-test gives the answer:

t.test(mtcars$mpg~mtcars$am)
## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

The p-value is 0.001374 the difference in means is not equal to 0. The mean mileage of automatic transmission is 17.15 mpg and the manual transmission is 24.39 mpg. Manual transmission seems betterfor mileage.

box plots are useful to the more clearly the separation.

boxplot(mpg~am, data = mtcars,
        xlab = "Transmission",
        ylab = "Miles per Gallon",
        main = "MPG by Transmission Type")

par(mfrow = c(2,2))

How can we quantify the difference?

A mono-dimensional regression can help to obtain a linear model mpg=f(am)

model1 <- lm(mpg ~ am, data = mtcars)
summary(model1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The model is here mpg=17.147+7.254 amManual, and this quantifies the difference between the two transmissions. Note also that regression answers the two questions at the same time.

We see here that the regression model covers only 36% of the variance. We can do better!

Conclusion

Using model contruction, we have shown that, adjusted to other strong mpg predictors that we can find in the mtcars dataset, manual transmission is really the best transmission for mpg with mmpg=9.61+2.93 amManual-3.91 wt +1.22 qsec