Overview

Motor Trend, a magazine about the automobile industry is looking at a data set of a collection of cars and are interested in exploring the relationship between a set of variables and miles per gallon (MPG). They are particularly interested in the following two questions:

Exploratory Analysis

An EDA identifies the relationship between transmission type and mpg. As can be seen in the box plot, Plot1, the type of transmission has a significant impact on mpg and the initial results show that manual transmission has a positive effect on mpg.

Linear regression model

In the linear model we use transmission type (am) as the predictor and miles per gallon (mpg) as the response.

fit <- lm(mpg ~ am, data = mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The linear model reveals that cars with a manual transmission (am = 1) get 7.24 miles more per gallon than cars with an automatic transmission (am = 0). However given an R-squared of .36 signifies that only 36% of variation in mpg is determined by the relationship between mpg and am. Therefore we need to take into account other variables that may play a role in the model.

Multivariate regression model.

A multivariate regression model takes into account more variables that have a significant impact on mpg. We identify significant variables that have an impact on mpg by running a pairs plot, Plot 2. From the pairs plot, Plot 2, we can tell that the variables cyl, disp, hp and wt have the strongest correlation with mpg. We build a new model fit_new with these variables and do a model test comparing it with the simple linear model fit.

fit_new <- lm(mpg ~ am + cyl + disp + hp + wt, data = mtcars)
anova(fit, fit_new)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + disp + hp + wt
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     26 163.12  4    557.78 22.226 4.507e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting p-value of 4.507e-08 for fit_new is much better than that of the original simple linear regression model fit.

summary(fit_new)
## 
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5952 -1.5864 -0.7157  1.2821  5.5725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.20280    3.66910  10.412 9.08e-11 ***
## am           1.55649    1.44054   1.080  0.28984    
## cyl         -1.10638    0.67636  -1.636  0.11393    
## disp         0.01226    0.01171   1.047  0.30472    
## hp          -0.02796    0.01392  -2.008  0.05510 .  
## wt          -3.30262    1.13364  -2.913  0.00726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.8273 
## F-statistic:  30.7 on 5 and 26 DF,  p-value: 4.029e-10

The fit_new model explains 85.5% of the variance and therefore cyl, disp, hp and wt affect the correlation between mpg and am. The fit_new model also indicates that manual transmission will increase miles per gallan by 1.556 miles as compared with automatic transmission.

Residual analysis and diagnostics

Plot 3 tells us the following:

Executive summary

This study shows that miles per gallon is better with manual transmission as compared to automatic transmission. However the difference in mpg is dependent on other variables. In this study we learned that number of cylinders (cyl), displacement (disp), horsepower (hp) and weight(wt) all influence by how much or how little mpg improves for manual transmission. We can say with confidence that mpg is 1.556 miles better for manual transmission when cyl, disp, hp and wt are included in the regression model.

APPENDIX

Plot 1

data(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs V/S [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors

mtcars$am <- as.factor(mtcars$am)
boxplot(mpg ~ am, data = mtcars, xlab = "transmission type", ylab = "miles per gallon")

Plot 2

pairs(mtcars, panel = panel.smooth)

Plot 3

par(mfrow = c(2,2))
plot(fit_new)