1. Executive summary

4.1) The results indicate that statistically the automatic type transmission has a lower fuel consumption, compared to the manual type transmission.

4.2) The R-Squared value for the optimized model indicates that it is a good model, a reliable model

4.3) The type of transmission is not the only variable that determines fuel consumption. Fuel consumption is also determined by acceleration and the weight of the car

Translated with www.DeepL.com/Translator (free version)

2. Data pre-processing

Getting the data:

data(mtcars)

See the variables that make up the data set:

names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

See the main metrics of the variables:

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

2. Data analysys

Check correlation of the mpg variable and the other variables:

cor(mtcars$mpg, mtcars[,-1])
##            cyl       disp         hp      drat         wt     qsec        vs
## [1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684 0.6640389
##             am      gear       carb
## [1,] 0.5998324 0.4802848 -0.5509251

Visually the correlation between variables can be seen like this:

pairs(mpg ~ ., data=mtcars)

The highest correlation is between 1) mpg and mt, 2) mpg and cyl, and 3) mpg and disp (negative correlations)

3. What type of transmission should I choose?

First it is necessary to change the type of variable for the case of ma, pass it to factor, in order to better interpret the information of this variable

mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic","Manual")

We can use a diagram to visually get an idea of the relationship between the two types of transmission with respect to fuel consumption:

library(ggplot2)

ggplot(mtcars, aes(x=am, y=mpg, fill=am))+
        geom_boxplot()+
        geom_jitter(color="black", size=0.4, alpha=0.9) +
        theme(legend.position="none",
              plot.title = element_text(size=11)) +
        ggtitle("Boxplot mgp vs transmission type") +
        xlab("Transmission type")+
        ylab("Gallons")

However, we need an analysis that statistically supports our decisions:

test.T <- t.test(mtcars$mpg ~ mtcars$am, conf.level=0.95)
test.T
## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

The p-value is less than **0.05*, which allows us to state that the automatic type transmission has a lower fuel consumption, as opposed to the manual transmission.

4. Quantifyng mpg difference

We will build a stepwise model (using all the variables) through the following instructions (usin 10.000 steps:

stepwise.model <- step(lm(data=mtcars, mpg ~ .), trace=0, steps=10000)
summary(stepwise.model)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amManual      2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The above result allows us to identify 3 variables that contribute most to the model: wt, qsec and am. With this new information we can optimize our model:

optimized.model <- lm(mpg ~ factor(am): wt+ factor(am):qsec, data=mtcars)
summary(optimized.model)
## 
## Call:
## lm(formula = mpg ~ factor(am):wt + factor(am):qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9361 -1.4017 -0.1551  1.2695  3.8862 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               13.9692     5.7756   2.419  0.02259 *  
## factor(am)Automatic:wt    -3.1759     0.6362  -4.992 3.11e-05 ***
## factor(am)Manual:wt       -6.0992     0.9685  -6.297 9.70e-07 ***
## factor(am)Automatic:qsec   0.8338     0.2602   3.205  0.00346 ** 
## factor(am)Manual:qsec      1.4464     0.2692   5.373 1.12e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.097 on 27 degrees of freedom
## Multiple R-squared:  0.8946, Adjusted R-squared:  0.879 
## F-statistic: 57.28 on 4 and 27 DF,  p-value: 8.424e-13

Our optimized model presents a value for R-Squared of: 0.89, higher value than the initial model

par(mfrow=c(2,2))
plot(optimized.model)

4. Final choise

4.1) The results indicate that statistically the automatic type transmission has a lower fuel consumption, compared to the manual type transmission.

4.2) The R-Squared value for the optimized model indicates that it is a good model, a reliable model

4.3) The type of transmission is not the only variable that determines fuel consumption. Fuel consumption is also determined by acceleration and the weight of the car