We obtained information from the American magazine Motor Trend for 1974, the data shows fuel consumption and 10 aspects of car design and performance for 32 cars (models 1973–74). It seeks to analyze the impact in miles per gallon and see if a manual or automatic transmission has more impact, taking into account the different variables involved in this fact.
The packages to be used for the project are:
library(datasets)
library(ggplot2)
library(viridis)
library(GGally)
The “datasets” package will be used to obtain the data.
library(datasets)
We get the official description of the data we use.
?mtcars
We will use the “mtcars” data. The description of these is: The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
We look at the names and description of the columns.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
We perform a general analysis on the data.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
A solid starting point for analysis is to look at the relationship of miles per gallon to the other variables by looking at their correlation.
cor(mtcars$mpg,mtcars[,-1])
## cyl disp hp drat wt qsec vs
## [1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684 0.6640389
## am gear carb
## [1,] 0.5998324 0.4802848 -0.5509251
We observe that the variables that are mostly negatively correlated with Miles / (US) are: cyl,disp,wt.
We observe that the variables that are most positively correlated with Miles / (US) are: drat, vs, am
We observe the dichotomous variable of transmission remembering that:
In Appendix 1 we observe the behavior of this variable.
We create a hypothesis test to compare whether the means are equal. The alternative hypothesis is that automatic consumes less than manual.
t.test(mtcars$mpg~mtcars$am,conf.level=0.95, alternative = "less")
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.0006868
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -3.913256
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
The p-value 0.0006868 allows us to reject the null hypothesis and not reject the alternative hypothesis. We could think that with the other constant variables automatic consumes less Miles per gallon than manual.
We use the Akaike information criterion (AIC) to choose the best model that explains the information.
mtcars$am <- factor(mtcars$am)
lmodel <- lm(data = mtcars, mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb)
O_model <- step(lmodel, direction = "both")
We use the ANOVA model to make sure of the model.
O_model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 21 147.4944 70.89774
## 2 - cyl 1 0.07987121 22 147.5743 68.91507
## 3 - vs 1 0.26852280 23 147.8428 66.97324
## 4 - carb 1 0.68546077 24 148.5283 65.12126
## 5 - gear 1 1.56497053 25 150.0933 63.45667
## 6 - drat 1 3.34455117 26 153.4378 62.16190
## 7 - disp 1 6.62865369 27 160.0665 61.51530
## 8 - hp 1 9.21946935 28 169.2859 61.30730
After performing the hypothesis test with Fisher’s F it can be seen that the best model is:
O_model
##
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
##
## Coefficients:
## (Intercept) am1 wt qsec
## 9.618 2.936 -3.917 1.226
The variables that best explain the data are: Weight, qsec and am. According to the algorithm using the step function.
summary(O_model)
##
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## am1 2.9358 1.4109 2.081 0.046716 *
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The model is explanatory with 84.97% of the variance. The t test of the variables is significant as it is less than 0.05, it indicates a linear relationship under these assumptions. Weight has an impact per 100 pounds of a decrease of -3.9165 and qsec of an increase of 1.2259 taking into account the assumptions of automatic transmission. Finally we can observe an increase of 2.9358 by the manual transmission variable in miles per gallon compared to an automatic transmission.
It is clear that a manual transmission increases miles per gallon compared to an automatic transmission. This under the assumptions of other variables that also significantly affect the miles per gallon such as Weight and qsec.
ggplot(mtcars, aes(x=factor(am), y=mpg)) +
geom_violin(aes(fill=factor(am)), color = "#FAA3F4") +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
xlab("Transmission, 0 automatic, 1 manual") +
ylab("Miles/(US) gallon")