Analysis of relationship between a transmission type and miles per gallon (MPG) at the mtcars database

This work has been created for the Course project at the Regression Models course in Coursera (Aug 2014). In this assigment we have to address questions about the relation between the car consume in milles per galon (MPG) and the kind of transmission (automatic/manual), therefore we have to answer the following basic questions:

  1. Is an automatic or manual transmission better for MPG?
  2. Quantify the MPG difference between automatic and manual transmissions

Summary

To solve the questions we develop the following steps that will be deeply explained at each section:

  1. Load the required libraries, read the database and split the data between automatic and manual transmission, so we can create one different regression model for each type of transmission.

  2. Develop some exploratory data analysis in order to select a predictor variable to use at the lineal regresion model creation. To achieve this objective we use correlation matrix graphics to visualize the correlation values between the predictor candidates, and finally select the Weight as the better candidate to develop the lineal regression model for each type of transmission.

  3. Finally we apply the lineal regresion model created to predict the results and develop some diagnosis and analysis of the residuals. In this section we use some inference to determine how is the behaviour of the consume in miles per galon as a function of the selected predictor variable.

Our basic strategy is to create a lineal regression model for each type of transmission using the same predictor variable and quantify the difference between automatic and manual cars using the regression parameters obtained.

Initialization and data input.

Loading libraries.

In order to exectute the R code the following libraries must be loaded.

library(caret)
library(ggplot2)
library(corrgram)
library(gridExtra)

Getting and cleaning data.

We start reading the mtcars database and aplltying some transformations to generate better visualizations.

# Loading data
data(mtcars)

Now we split the data by transmission type.

# Split data : automatic / manual
manual <- mtcars[which(mtcars$Transmision == 'Manual'),]
automatic <- mtcars[which(mtcars$Transmision == 'Automatic'),]

Exploratory data analysis.

We start our exploratory data analysis by checking the consume in miles per galon versus the type of transmission. The result of this test is showed in the next figure:

plot of chunk unnamed-chunk-5

Two basic questions can be addressed from this figure:

  1. In general the automatic transmission type seems to have a better behaviour than manual with a higher mediam value of miles per galon, but this could be related to another variable that is not represented in this plot.
  2. Altougth this general behaviour, both type of transmissions present a large sneak zone. Therefore this indicates that, related to the consume in miles per galon, we have to quantify the difference in order to stablish a limit that sets a point from where one type of transmission is better than the other.

To continue with the exploratory data analysis our objective now is to select the predictor variable that will be used at the lineal regression models. In the next figure we have the correlation matrix so we can visualize which variables are highly correlated because they appear with colours very resalted (red for negative and blue for possitive correlation values) so we can easy select the predictor candidate.

plot of chunk unnamed-chunk-6

From this plot we select the Weight as the the better predictor for all the data, but we have to check if this variable has good correlation properties in both cases, manual and automatic transmission types. In the next figures we show the correlation matrices for automatic and manual transmissions, where we can see that Weight is still a good candidate for both transmission types, altougth in the case of manual transmission there are better candidates (Cylinders, Displacement and HorsePower).

plot of chunk unnamed-chunk-7

plot of chunk unnamed-chunk-8

Once we have selected the predictor variable is time to generate the lineal regressors models for each type of transmission.

Lineal regression model.

In the next plot we show the consume in miles per galon versus the weight coloured by transmission type and we add the lineal regressors for each type of transmisssion and its respective smoother regions.

plot of chunk unnamed-chunk-9

From this figure we can set that :

  1. For weigths below 2.8 lb/1000 the automatic transmission presents better consume properties, but for weigth over 2.8 lb/1000 manual transmission is better.
  2. At the intersection point (respresented with a yellow circle in the plot) we have weigth 2.8 lb/1000 and the predicted consume value is 20.8 miles per galon.
  3. We have to consider that automatic cars tends to have less weight, so this seems to be the principal cause of its better consume properties compared with manual transmission cars.

Now we can deploy some inference by using the lineal regression models with a more interpretable formulation.

# Manual transmission linear regresion model.
manual.weight.mean <-  mean(manual$Weight)
manual.model.lm <- lm(MPG ~ I(Weight - manual.weight.mean), data = manual)

# Manual summaryize
summary(manual.model.lm)$coefficients
##                                Estimate Std. Error t value  Pr(>|t|)
## (Intercept)                      17.147     0.5800  29.563 4.667e-16
## I(Weight - manual.weight.mean)   -3.786     0.7666  -4.939 1.246e-04

From this result we can state that for manual transmission cars:

  1. An increment in 1 lb/1000 in the weight of the car generates a decrement of 3.789 miles per galon.
  2. 17.147 is the expected consume in miles per galon for the average weigth size of 3.7689 lb/1000.
# Automatic transmission linear regresion model.
automatic.weight.mean <-  mean(automatic$Weight)
automatic.model.lm <- lm(MPG ~ I(Weight - automatic.weight.mean), data = automatic)

# Automatic summaryize
summary(automatic.model.lm)$coefficients
##                                   Estimate Std. Error t value  Pr(>|t|)
## (Intercept)                         24.392     0.7449  32.747 2.568e-12
## I(Weight - automatic.weight.mean)   -9.084     1.2566  -7.229 1.688e-05

From this result we can state that for automatic transmission cars:

  1. An increment in 1 lb/1000 in the weight of the car generates a decrement of 9.084 miles per galon.
  2. 24.392 is the expected consume in miles per galon for the average weigth size of 2.411 lb/1000.

In the next plot we present the residual plot for automatic and manual transmissions using the proposed lineal regression models.

plot of chunk unnamed-chunk-12

From this plots we can’t appreciate any residual pattern but we can easily detect a couple of outliers in the prediction process. Finally lets take a look at some influence measures for the manual transmission:

hatvalues(manual.model.lm)
##      Hornet 4 Drive   Hornet Sportabout             Valiant 
##             0.08083             0.06258             0.06140 
##          Duster 360           Merc 240D            Merc 230 
##             0.05627             0.08344             0.08784 
##            Merc 280           Merc 280C          Merc 450SE 
##             0.06258             0.06258             0.06097 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
##             0.05277             0.05264             0.25429 
## Lincoln Continental   Chrysler Imperial       Toyota Corona 
##             0.30445             0.28099             0.20892 
##    Dodge Challenger         AMC Javelin          Camaro Z28 
##             0.05833             0.06288             0.05310 
##    Pontiac Firebird 
##             0.05316

and for the automatic transmission type:

hatvalues(automatic.model.lm)
##      Mazda RX4  Mazda RX4 Wag     Datsun 710       Fiat 128    Honda Civic 
##        0.08649        0.12405        0.07874        0.08667        0.21563 
## Toyota Corolla      Fiat X1-9  Porsche 914-2   Lotus Europa Ford Pantera L 
##        0.14955        0.12652        0.09300        0.25346        0.20304 
##   Ferrari Dino  Maserati Bora     Volvo 142E 
##        0.10514        0.37099        0.10673

The models Lincoln Continental and Chrysler Imperial at the manual transmission and the models Maserati Bora and Lotus Europa at the automatic transmission accumulates the largest influence in the residual errors.