Analysis Of mtcars Data Using Regression Models

Executive Summary

Motor Trend is a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome).

They are particularly interested in the following two questions: * Is an automatic or manual transmission better for KPL * Quantify the KPL difference between automatic and manual transmissions

Using hypothesis testing and simple linear regression, we can conclude that there is a signficant difference between the mean MPG for automatic and manual transmission cars and hence conclude that “manual transmission better than automatic transmission for MPG”

To confirm our conclusions & to adjust for confounding variables such as the weight and quarter mile time (acceleration) of the car, multivariate regression analysis was run to understand the impact of transmission type on MPG.

The best-fit model results indicates that weight and quarter mile time (acceleration) have signficant impact of the mpg between automatic and manual transmission cars.

Data

location

Data was obtained in R CRAN and its documentation can be found on http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Format

A data frame with 32 observations on 11 variables

  • [, 1] mpg Miles/(US) gallon
  • [, 2] cyl Number of cylinders
  • [, 3] disp Displacement (cu.in.)
  • [, 4] hp Gross horsepower
  • [, 5] drat Rear axle ratio
  • [, 6] wt Weight (lb/1000)
  • [, 7] qsec 1/4 mile time
  • [, 8] vs V/S
  • [, 9] am Transmission (0 = automatic, 1 = manual)
  • [,10] gear Number of forward gears
  • [,11] carb Number of carburetors

Pre Process

Pre-Requisites

Before you start execution of this Rmd file, please set working dir to your repository

setwd(your_path)

Load Data

# load data
load("mtcars.RData")

cars <- mtcars
head(cars)
##                   Cylinders Displacement Gross.Horsepower Rear.Axle.Ratio
## Mazda RX4                 6          160              110            3.90
## Mazda RX4 Wag             6          160              110            3.90
## Datsun 710                4          108               93            3.85
## Hornet 4 Drive            6          258              110            3.08
## Hornet Sportabout         8          360              175            3.15
## Valiant                   6          225              105            2.76
##                   Weight Quarter.Mile.Time     Engine.Type Transmission
## Mazda RX4          2.620             16.46        V.Engine       Manual
## Mazda RX4 Wag      2.875             17.02        V.Engine       Manual
## Datsun 710         2.320             18.61 Straight.Engine       Manual
## Hornet 4 Drive     3.215             19.44 Straight.Engine    Automatic
## Hornet Sportabout  3.440             17.02        V.Engine    Automatic
## Valiant            3.460             20.22 Straight.Engine    Automatic
##                   Gears.Number Carburetors.Number Consumtion.Kpl
## Mazda RX4                    4                  4       8.928000
## Mazda RX4 Wag                4                  4       8.928000
## Datsun 710                   4                  1       9.693257
## Hornet 4 Drive               3                  1       9.098057
## Hornet Sportabout            3                  2       7.950171
## Valiant                      3                  1       7.695086
str(cars)
## 'data.frame':    32 obs. of  11 variables:
##  $ Cylinders         : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ Displacement      : num  160 160 108 258 360 ...
##  $ Gross.Horsepower  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ Rear.Axle.Ratio   : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ Weight            : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ Quarter.Mile.Time : num  16.5 17 18.6 19.4 17 ...
##  $ Engine.Type       : Factor w/ 2 levels "V.Engine","Straight.Engine": 1 1 2 2 1 2 1 2 2 2 ...
##  $ Transmission      : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ Gears.Number      : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ Carburetors.Number: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
##  $ Consumtion.Kpl    : num  8.93 8.93 9.69 9.1 7.95 ...
names(mtcars)
##  [1] "Cylinders"          "Displacement"       "Gross.Horsepower"  
##  [4] "Rear.Axle.Ratio"    "Weight"             "Quarter.Mile.Time" 
##  [7] "Engine.Type"        "Transmission"       "Gears.Number"      
## [10] "Carburetors.Number" "Consumtion.Kpl"

Basic Data Analysis

Summary

summary(cars)
##    Cylinders      Displacement   Gross.Horsepower Rear.Axle.Ratio
##  Min.   :4.000   Min.   : 71.1   Min.   : 52.0    Min.   :2.760  
##  1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5    1st Qu.:3.080  
##  Median :6.000   Median :196.3   Median :123.0    Median :3.695  
##  Mean   :6.188   Mean   :230.7   Mean   :146.7    Mean   :3.597  
##  3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0    3rd Qu.:3.920  
##  Max.   :8.000   Max.   :472.0   Max.   :335.0    Max.   :4.930  
##      Weight      Quarter.Mile.Time          Engine.Type    Transmission
##  Min.   :1.513   Min.   :14.50     V.Engine       :18   Automatic:19   
##  1st Qu.:2.581   1st Qu.:16.89     Straight.Engine:14   Manual   :13   
##  Median :3.325   Median :17.71                                         
##  Mean   :3.217   Mean   :17.85                                         
##  3rd Qu.:3.610   3rd Qu.:18.90                                         
##  Max.   :5.424   Max.   :22.90                                         
##  Gears.Number Carburetors.Number Consumtion.Kpl  
##  3:15         1: 7               Min.   : 4.421  
##  4:12         2:10               1st Qu.: 6.558  
##  5: 5         3: 3               Median : 8.163  
##               4:10               Mean   : 8.541  
##               6: 1               3rd Qu.: 9.693  
##               8: 1               Max.   :14.412
table.kpl.means <- tapply(cars$Consumtion.Kpl, cars$Transmission, mean)
table.kpl.means
## Automatic    Manual 
##  7.290081 10.370215
barplot(
  height = table.kpl.means,
  main = "Average Fuel consumption(Kpl() by transmission")

boxplot(cars$Consumtion.Kpl ~ cars$Transmission, data = mtcars,
        col = c("dark grey", "light grey"),
        xlab = "Transmission",
        ylab = "KPL",
        main = "Average Fuel consumption(Kpl() by transmission")

# plot densities 
sm.density.compare(cars$Consumtion.Kpl, cars$Transmission, xlab="Consumption")
title(main="Fuel Consumtion by Transmission")

Observation

From the above graph, it can be seen that the following basic assumptions are met.

  • The distribution of kpl is approximately normal
  • Outliers are not skewing the data
auto.rows <- cars[cars$Transmission == "Automatic",]
manual.rows <- cars[cars$Transmission == "Manual",]
ttest <- t.test(auto.rows$Consumtion.Kpl, manual.rows$Consumtion.Kpl)
ttest
## 
##  Welch Two Sample t-test
## 
## data:  auto.rows$Consumtion.Kpl and manual.rows$Consumtion.Kpl
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.795694 -1.364574
## sample estimates:
## mean of x mean of y 
##  7.290081 10.370215

Observation

The p-value is 0.0014, so we can reject the null hypothesis and conclude automatic has low kpl compared with manual cars. This ratifies our observation as seen in the above boxplot graph titled “Fuel consumption by Transmission”. However this conclusion would be incomplete without further investigation. The should be further explored using the multiple linear regression analysis.

Exploratory Data Analysis

We will be running a linear regression tests on this data.

Linear Regression Models

Simple Linear Regression

model.kpl.to.transmission <- lm(Consumtion.Kpl~Transmission, data=cars)
model.summary <- summary(model.kpl.to.transmission)
model.summary
## 
## Call:
## lm(formula = Consumtion.Kpl ~ Transmission, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9931 -1.3147 -0.1264  1.3791  4.0421 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.2901     0.4781  15.247 1.13e-15 ***
## TransmissionManual   3.0801     0.7501   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.084 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Observation

Interpreting the coefficient and intercepts, we say that, on average, manual transmission cars have 7.2449 mpg more than automatic transmission. In addition, we see that the R^2 value is 0.3598. This means that our model explains 35.9799% of the variance (not sufficient). Hence we can say that we do not gain much information from our hypothesis test using this model.

pairs(cars)

str(cars)
## 'data.frame':    32 obs. of  11 variables:
##  $ Cylinders         : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ Displacement      : num  160 160 108 258 360 ...
##  $ Gross.Horsepower  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ Rear.Axle.Ratio   : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ Weight            : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ Quarter.Mile.Time : num  16.5 17 18.6 19.4 17 ...
##  $ Engine.Type       : Factor w/ 2 levels "V.Engine","Straight.Engine": 1 1 2 2 1 2 1 2 2 2 ...
##  $ Transmission      : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ Gears.Number      : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ Carburetors.Number: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
##  $ Consumtion.Kpl    : num  8.93 8.93 9.69 9.1 7.95 ...
correlations <- cor(cars[,c(1,2,3,4,5,6,11)])
# create correlation plot
corrplot(correlations, method="circle")

sort(cor(cars[,c(1,2,3,4,5,6,11)])[7,])
##            Weight         Cylinders      Displacement  Gross.Horsepower 
##        -0.8676594        -0.8521620        -0.8475514        -0.7761684 
## Quarter.Mile.Time   Rear.Axle.Ratio    Consumtion.Kpl 
##         0.4186840         0.6811719         1.0000000
final.model <- lm(formula = cars$Consumtion.Kpl ~ cars$Weight  + cars$Quarter.Mile.Time + cars$Transmission, data = cars)
summary(final.model)
## 
## Call:
## lm(formula = cars$Consumtion.Kpl ~ cars$Weight + cars$Quarter.Mile.Time + 
##     cars$Transmission, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4800 -0.6613 -0.3085  0.5999  1.9816 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4.0889     2.9588   1.382 0.177915    
## cars$Weight              -1.6651     0.3024  -5.507 6.95e-06 ***
## cars$Quarter.Mile.Time    0.5212     0.1227   4.247 0.000216 ***
## cars$TransmissionManual   1.2482     0.5998   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Observation

This model captured 83.26% of total variance in kpl. The p-value is 0.00000000005399. Based on above, we can reject the null hypothesis and can conclude that our multivariate model is significantly different from our simple linear regression model.

Result Summary

  1. This model explains 83.26% of the variance in miles per gallon (mpg).

  2. We see that Weight & Cylinders did impact the relationship between am and mpg (mostly wt).

Therefore given the above analysis, the question of “Is an automatic or manual transmission better for KPL” can not be answered without considering Weight & Cylinders.

Again from the above analysis, to answer the question “Quantify the MPG difference between automatic and manual transmissions”, we refer to the coefficient for am and based on that we can say that, on average, manual transmission cars have 2.9358 mpg more than automatic transmission cars.