Executive summary

The Motor Trend magazine has develop an analysis about the performance of MPG in several car models, this performance analysis contrast two types of transmission in the model review (automatic versus manual). So, is an automatic or manual transmission better for MPG? Then there is a quantification the MPG difference between automatic and manual transmissions. The results indicates that the manual transmission cars get more MPG values than the automatic transmission, this is according not only the transmission type, but the relation between the transmission and the weight of the car. The T-test shows the performance difference between cars with Automatic versus manual transmission was better in the manual transmission cars.

Exploratory data analysis

Library

library(datasets)
library(ggplot2)
library(ggfortify)
data("mtcars")

Exploratory data analysis

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

According to the exploratory analysis, there are higher correlations with mpg with “cyl”, “wt”, “disp” and “hp”.

Inference

The question made was that if the automatic or manual transmission are better for the mpg. Understanding that mgp is a variable of performance, and transmission (am) is a discrete variable, there must e some other factors that can impact the mpg. So, the null hypothesis it would be that there´s no difference between manual or automatic transmission, while the H1 hypothesis is that exists a difference between transmissions. So, null hypothesis indicates that if manual and automatic are from the same population, their distribution trends to be the same. For accept or reject this I run a two sample T-test:

test_am <- t.test(mtcars$mpg~ mtcars$am)
test_am$p.value

## [1] 0.001373638

test_am$estimate

## mean in group 0 mean in group 1 
##        17.14737        24.39231

The p-value shows that null hypothesis can be rejected, and the two types of transmission are from different populations. The differences between means of MPG of manual transmissions is about 7 more than automatic transmissions.

Regression analysis

The first model were fit with all variables, but take the intercept as “AM” variable, considering that this is the primal question.

complete_model <- lm(mpg ~ ., data = mtcars)
complete_model$coefficients

## (Intercept)         cyl        disp          hp        drat          wt 
## 12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393 
##        qsec          vs          am        gear        carb 
##  0.82104075  0.31776281  2.52022689  0.65541302 -0.19941925

Residual standard error of 2.637 on 21 degrees of freedom. The adjusted R-squared is 0.081, with an explanation of the variance of MPG variable of nearly the 80%. However, non of the coefficients are significant at 95%. Then, it is necessary to select statistical significant variables

model2 <- step(complete_model, k=log(nrow(mtcars)))

The model were selected by “Akaike Information Criterion” (AIC), used to determine what model fits better. In this particular case the model selected was:

M1: MPG ~ WT + QSEC + AM

This model with only 3 variables can explain the 67.17 of the variance, and has p-values with high significance (<0.05). The model m1 has a residual standard error of 2.459, with 28 degrees of freedom, and a R^2 of 0.8497, the p-values were significant for each of the variables (p-value < 0.05). But, there is an interaction between the weight and the transmission, were the automatic transmission are more weighted than the manual transmission, so I’ve done another model to test:

m2 <- lm(mpg~qsec+am+wt:am, data = mtcars)
summary(m2)

## 
## Call:
## lm(formula = mpg ~ qsec + am + wt:am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4694 -0.9707  0.1469  1.8014  4.7670 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -7.7295     5.6328  -1.372 0.180885    
## qsec          1.3681     0.3079   4.443 0.000127 ***
## am           23.7657     3.4011   6.988 1.34e-07 ***
## am:wt        -6.3851     1.3950  -4.577 8.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.684 on 28 degrees of freedom
## Multiple R-squared:  0.8209, Adjusted R-squared:  0.8017 
## F-statistic: 42.77 on 3 and 28 DF,  p-value: 1.386e-10

So, to test this model, I had run another simplest model that relates the MPG variable only with the AM variable:

mp <- lm(mpg~am, data=mtcars)
summary(mp)

## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The results shows that the model m2 has a R-squared index of 0.8937, with a residual standard error of 2.068 with 28 freedom degrees. In the other hand, the model mp has a R-squared of 0.3598, with a residual standard error of 4.902 and 30 freedom degrees. Now for the selection of the final model:

m2$coefficients

## (Intercept)        qsec          am       am:wt 
##   -7.729507    1.368127   23.765661   -6.385125

The coefficients and results show that when the QSEC remains constant, the cars with manual transmission add 11.57+(-3.3)*WT more MPG that the cars with automatic transmission. This corresponds to a manual transmitted car that weights 2000 lbs have 4.959 more MPG than an automatic transmission cars with the same weight and QSEC values.

Residuals analysis

Please refer to the Appendix section for the plots. According to the residual plots, the following underlying assumptions can be verified: A. The Residuals vs. Fitted plot shows no consistent pattern, supporting the accuracy of the independence assumption. B. The Normal Q-Q plot indicates that the residuals are normally distributed because the points lie closely to the line. C. The Scale-Location plot confirms the constant variance assumption, as the points are randomly distributed. D. The Residuals vs. Leverage argues that no outlier are present, as all values fall well within the 0.5 bands.

As for the Dfbetas, the measure of how much an observation has effected the estimate of a regression coefficient, this is the following result:

sum((abs(dfbetas(m2)))>1)

## [1] 0

Appendix

Exploratory data analysis

mtcars$cyl <- as.numeric(mtcars$cyl)
mtcars$vs <- as.numeric(mtcars$vs)
mtcars$am <- as.numeric(mtcars$am)
mtcars$gear <- as.numeric(mtcars$gear)
mtcars$carb <- as.numeric(mtcars$carb)

Covariance test

pairs(mtcars, panel = panel.smooth, main = "Pair graph Road Test MT-Cars")

Mean boxplot

boxplot(mtcars$mpg~mtcars$am, xlab="Transmission (0 = Automatic, 1 = Manual)", ylab="MPG",main="Boxplot of MPG vs. Transmission")

MPG vs weight by transmission

mtcars$am <- as.factor(mtcars$am)
plt3 <- ggplot(data=mtcars, aes(x=wt, y=mpg)) +
        geom_point(aes(group=am, color=am, height=3, width=3)) +
        scale_colour_discrete(labels=c("Automatic", "Manual")) + 
        xlab("Weight") + 
        ggtitle("MPG related Weight by Transmission")+
        theme(panel.border = element_blank(),
              panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(), 
              axis.line = element_line(colour = "black"))

## Warning: Ignoring unknown aesthetics: height, width

plt3

Residuals plots

autoplot(m2, smooth.colour=NA)

## Warning: Removed 32 row(s) containing missing values (geom_path).
## Removed 32 row(s) containing missing values (geom_path).
## Removed 32 row(s) containing missing values (geom_path).

Moto Trend - MPG analysis

Felipe Wolff

2023-05-15

Executive summary

Exploratory data analysis

Library

Exploratory data analysis

Inference

Regression analysis

Residuals analysis

Appendix

Exploratory data analysis

Covariance test

Mean boxplot

MPG vs weight by transmission

Residuals plots