Executive Summary

The Motor Trend Magazine is interested in exploring the relationship between a set of variables and miles per gallon (MPG). They are particularly interested in the following two questions:

  • Is an automatic or manual transmission better for MPG?
  • Quantify the MPG difference between automatic and manual transmissions?

    My analysis shows that:

  • When measuring MPG, manual transmissions perform better than automatic transmissions by 7.25MPG, however this single factor only accounts for 36% of the explanation
  • When measuring MPG, manual transmissions provide an additional 1.48MPG of performance over automatic transmissions when taking into account three additonal explanatory variables (cylinders, horsepower & weight), these additional factors account for 85% of the explanation

  • Exploratory Data Analysis

    library(datasets)
    data(mtcars)
    
    # View few samples of the dataset:
    head(mtcars, 5)
    ##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    ## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    ## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    ## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    ## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    ## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    #Variables:
    str(mtcars)
    ## 'data.frame':    32 obs. of  11 variables:
    ##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    ##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
    ##  $ disp: num  160 160 108 258 360 ...
    ##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
    ##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
    ##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
    ##  $ qsec: num  16.5 17 18.6 19.4 17 ...
    ##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
    ##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
    ##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
    ##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
    #Statistical summary of mpg variable:
    summary(mtcars$mpg)
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   10.40   15.42   19.20   20.09   22.80   33.90
    #Visualization ~ Automatic vs Manual Transmission:
    library(ggplot2)
    mtcars$am <- as.factor(mtcars$am)
    transTyp <- ggplot(aes(x=am, y=mpg), data=mtcars) + geom_boxplot(aes(fill=am))
    transTyp <- transTyp + labs(title = "Automatic vs Manual Transmission Boxplot")
    transTyp <- transTyp + xlab("Transmission Type")
    transTyp <- transTyp + ylab("MPG")
    transTyp <- transTyp + labs(fill = "Legend (0=AT, 1=MT)")
    transTyp

    #Automatic vs Manual Transmission boxplot stats:
    transStats = split(mtcars$mpg, mtcars$am)
    
    #Mean:
    sapply(transStats, mean)
    ##        0        1 
    ## 17.14737 24.39231
    #Stdev:
    sapply(transStats, sd)
    ##        0        1 
    ## 3.833966 6.166504
    #Range:
    sapply(transStats, range)
    ##         0    1
    ## [1,] 10.4 15.0
    ## [2,] 24.4 33.9
    #Automatic vs Manual Transmission Hypothesis Test:
    autoTrans <- mtcars[mtcars$am == "0",]
    manTrans <- mtcars[mtcars$am == "1",]
    t.test(autoTrans$mpg, manTrans$mpg)
    ## 
    ##  Welch Two Sample t-test
    ## 
    ## data:  autoTrans$mpg and manTrans$mpg
    ## t = -3.7671, df = 18.332, p-value = 0.001374
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -11.280194  -3.209684
    ## sample estimates:
    ## mean of x mean of y 
    ##  17.14737  24.39231

    SYNOPSIS The boxplot above clearly indicates that manual transmissions provide better gas mileage than automatics. To test this claim, a hypothesis test is performed that rejects the null hypothesis, i.e., that the transmission type is in fact significantly correlated to gas mileage. Regression analyses will now be performed to quantify how much of a factor transmission type accounts for gas mileage.

    Regression Models

  • Linear Regression Model

    lrModel <- lm(mpg ~ am, data = mtcars)
    summary(lrModel)
    ## 
    ## Call:
    ## lm(formula = mpg ~ am, data = mtcars)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
    ## am1            7.245      1.764   4.106 0.000285 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 4.902 on 30 degrees of freedom
    ## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
    ## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
  • Multivariable Regression Model

    mrModel <- lm(mpg~am + cyl + hp + wt, data = mtcars)
    anova(lrModel, mrModel)
    ## Analysis of Variance Table
    ## 
    ## Model 1: mpg ~ am
    ## Model 2: mpg ~ am + cyl + hp + wt
    ##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
    ## 1     30 720.9                                  
    ## 2     27 170.0  3     550.9 29.166 1.274e-08 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    summary(mrModel)
    ## 
    ## Call:
    ## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -3.4765 -1.8471 -0.5544  1.2758  5.6608 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) 36.14654    3.10478  11.642 4.94e-12 ***
    ## am1          1.47805    1.44115   1.026   0.3142    
    ## cyl         -0.74516    0.58279  -1.279   0.2119    
    ## hp          -0.02495    0.01365  -1.828   0.0786 .  
    ## wt          -2.60648    0.91984  -2.834   0.0086 ** 
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 2.509 on 27 degrees of freedom
    ## Multiple R-squared:  0.849,  Adjusted R-squared:  0.8267 
    ## F-statistic: 37.96 on 4 and 27 DF,  p-value: 1.025e-10

    SYNOPSIS A simple linear regression model is first conducted to find out how much of an affect transmission type actually has on gas mileage performance, which is our initial claim supported by our our preliminary exploratory data analysis. In this instance, transmission type, specifically manual transmissions, provide 7.25MPG (the am1 coefficient) better performance than automatic ones. However, based upon the R-squared value, trasmission types only explain 36% of the MPG performance, and thus this simple linear regression is not a very good model to answer Motor Trend’s questions with any definitiveness. A more logical approach would be to incorporate a multivariable regression model to take into account other variables that would most likely affect a vehicle’s gas mileage, e.g., number of cylinders, engine horsepower, vehicle weight, etc. Under this assumption, I therefore took these three variables from the dataset and ran a multivariable regression. This second model provided the following results: a 1.48MPG increase from manual transmissions over automatic ones with the additional variables (multivariable) model explaining 85% of the MPG performance.

  • Appendix

    #Scatterplot matrix of the dataset:
    pairs(mpg ~ ., data = mtcars)

    #Scatterplots of the multivariable regression model residuals:
    par(mfrow = c(2,2))
    plot(mrModel)