Synopsis

We obtained information from the American magazine Motor Trend for 1974, the data shows fuel consumption and 10 aspects of car design and performance for 32 cars (models 1973–74). It seeks to analyze the impact in miles per gallon and see if a manual or automatic transmission has more impact, taking into account the different variables involved in this fact.

Development

Packages

The packages to be used for the project are:

library(datasets)
library(ggplot2)
library(viridis)
library(GGally)

Exploratory analysis

The “datasets” package will be used to obtain the data.

library(datasets)

We get the official description of the data we use.

?mtcars

We will use the “mtcars” data. The description of these is: The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

We look at the names and description of the columns.

    [, 1]   mpg Miles/(US) gallon
    [, 2]   cyl Number of cylinders
    [, 3]   disp    Displacement (cu.in.)
    [, 4]   hp  Gross horsepower
    [, 5]   drat    Rear axle ratio
    [, 6]   wt  Weight (1000 lbs)
    [, 7]   qsec    1/4 mile time
    [, 8]   vs  Engine (0 = V-shaped, 1 = straight)
    [, 9]   am  Transmission (0 = automatic, 1 = manual)
    [,10]   gear    Number of forward gears
    [,11]   carb    Number of carburetors

We perform a general analysis on the data.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

A solid starting point for analysis is to look at the relationship of miles per gallon to the other variables by looking at their correlation.

cor(mtcars$mpg,mtcars[,-1])
##            cyl       disp         hp      drat         wt     qsec        vs
## [1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684 0.6640389
##             am      gear       carb
## [1,] 0.5998324 0.4802848 -0.5509251
  • We observe that the variables that are mostly negatively correlated with Miles / (US) are: cyl,disp,wt.

  • We observe that the variables that are most positively correlated with Miles / (US) are: drat, vs, am

Comparison between automatic an manual transmission for MPG

We observe the dichotomous variable of transmission remembering that:

  • 0 automatic
  • 1 manual

In Appendix 1 we observe the behavior of this variable.

We create a hypothesis test to compare whether the means are equal. The alternative hypothesis is that automatic consumes less than manual.

t.test(mtcars$mpg~mtcars$am,conf.level=0.95, alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.0006868
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -3.913256
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

The p-value 0.0006868 allows us to reject the null hypothesis and not reject the alternative hypothesis. We could think that with the other constant variables automatic consumes less Miles per gallon than manual.

We use the Akaike information criterion (AIC) to choose the best model that explains the information.

mtcars$am <- factor(mtcars$am)
lmodel <- lm(data = mtcars, mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb)
O_model <- step(lmodel, direction = "both")

We use the ANOVA model to make sure of the model.

O_model$anova
##     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1        NA         NA        21   147.4944 70.89774
## 2  - cyl  1 0.07987121        22   147.5743 68.91507
## 3   - vs  1 0.26852280        23   147.8428 66.97324
## 4 - carb  1 0.68546077        24   148.5283 65.12126
## 5 - gear  1 1.56497053        25   150.0933 63.45667
## 6 - drat  1 3.34455117        26   153.4378 62.16190
## 7 - disp  1 6.62865369        27   160.0665 61.51530
## 8   - hp  1 9.21946935        28   169.2859 61.30730

After performing the hypothesis test with Fisher’s F it can be seen that the best model is:

O_model
## 
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
## 
## Coefficients:
## (Intercept)          am1           wt         qsec  
##       9.618        2.936       -3.917        1.226

The variables that best explain the data are: Weight, qsec and am. According to the algorithm using the step function.

summary(O_model)
## 
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## am1           2.9358     1.4109   2.081 0.046716 *  
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The model is explanatory with 84.97% of the variance. The t test of the variables is significant as it is less than 0.05, it indicates a linear relationship under these assumptions. Weight has an impact per 100 pounds of a decrease of -3.9165 and qsec of an increase of 1.2259 taking into account the assumptions of automatic transmission. Finally we can observe an increase of 2.9358 by the manual transmission variable in miles per gallon compared to an automatic transmission.

Conclusion

It is clear that a manual transmission increases miles per gallon compared to an automatic transmission. This under the assumptions of other variables that also significantly affect the miles per gallon such as Weight and qsec.

Apendix

ggplot(mtcars, aes(x=factor(am), y=mpg)) + 
        geom_violin(aes(fill=factor(am)), color = "#FAA3F4") +
        scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
        xlab("Transmission, 0 automatic, 1 manual") +
        ylab("Miles/(US) gallon")