Executive Summary

This project analyses mtcars, a data set extracted from the 1974 Motor Trend US magazine comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. The goal of the analysis is to explore the relationship between a set of variables and miles per gallon (MPG). The analysis shows that:

Exploratory Analysis

The figures can be found in the Appendix at the end of the document.

data(mtcars)
mtcars$am <- as.factor(mtcars$am) 
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$carb <- as.factor(mtcars$carb)

The data set is a data frame with 32 observations on 11 variables (Figure 1). Some variables (wt, hp, cyl, disp, vs, am) seem to be correlated to mpg (Figure 2). Focusing the analysis on trasmission type am, the boxplot (Figure 3) shows that cars with manual transmission seem to have higher miles per gallon. This is confirmed by running a t-test (Figure 4) where the null hypothesis of no difference between manual and automatic is rejected.

Regression Models

Simple Model

The first model has only one independent variable, am. The coefficient for am1 represent the increase in mpg when changing from automatic to manual transmission. The model produced statistically significant coefficients, but the R-squared says that about 36% of the variance is explained by the model.

fit_easy <- lm(mpg~am,data=mtcars)
summary(fit_easy)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Multivariable Model

A multivariable model is built using a stepwise model selection to find which variables are significant predictors.

fit_all <- lm(mpg~.,data=mtcars)
fit_multi <- step(fit_all,direction = 'both')
summary(fit_multi)
## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## am1          1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

The multivariable model has a higher R-squared, thus it explaines more variance. The coefficient for am is positive, meaning that manual transmission has higher miles per gallon than automatic.

Residuals and Diagnostics

Finally the model residuals are plotted (Figure 5). The chart with fitted values / residuals shows the points are randomly distributed and the normal Q-Q plot shows residuals are normally distributed. The Scale-Location chart seems to indicate constant variance of the residuals. There are a few outliers, but they do not seem to have a big impact on the model.

Appendix

Figure 1

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

Figure 2

pairs(mtcars)

Figure 3

boxplot(mpg~am,data=mtcars, col=c('blue','red'),xlab='Trasmission',ylab='MPG',
        main='Transmission Type and MPG',
        names=c('Automatic','Manual'))

Figure 4

t.test(mpg~am,data=mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Figure 5

par(mfrow=c(2,2))
plot(fit_multi)