4.1) The results indicate that statistically the automatic type transmission has a lower fuel consumption, compared to the manual type transmission.
4.2) The R-Squared value for the optimized model indicates that it is a good model, a reliable model
4.3) The type of transmission is not the only variable that determines fuel consumption. Fuel consumption is also determined by acceleration and the weight of the car
Translated with www.DeepL.com/Translator (free version)
Getting the data:
data(mtcars)
See the variables that make up the data set:
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
See the main metrics of the variables:
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Check correlation of the mpg variable and the other variables:
cor(mtcars$mpg, mtcars[,-1])
## cyl disp hp drat wt qsec vs
## [1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684 0.6640389
## am gear carb
## [1,] 0.5998324 0.4802848 -0.5509251
Visually the correlation between variables can be seen like this:
pairs(mpg ~ ., data=mtcars)
The highest correlation is between 1) mpg and mt, 2) mpg and cyl, and 3) mpg and disp (negative correlations)
First it is necessary to change the type of variable for the case of ma, pass it to factor, in order to better interpret the information of this variable
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic","Manual")
We can use a diagram to visually get an idea of the relationship between the two types of transmission with respect to fuel consumption:
library(ggplot2)
ggplot(mtcars, aes(x=am, y=mpg, fill=am))+
geom_boxplot()+
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme(legend.position="none",
plot.title = element_text(size=11)) +
ggtitle("Boxplot mgp vs transmission type") +
xlab("Transmission type")+
ylab("Gallons")
However, we need an analysis that statistically supports our decisions:
test.T <- t.test(mtcars$mpg ~ mtcars$am, conf.level=0.95)
test.T
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
The p-value is less than **0.05*, which allows us to state that the automatic type transmission has a lower fuel consumption, as opposed to the manual transmission.
We will build a stepwise model (using all the variables) through the following instructions (usin 10.000 steps:
stepwise.model <- step(lm(data=mtcars, mpg ~ .), trace=0, steps=10000)
summary(stepwise.model)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amManual 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The above result allows us to identify 3 variables that contribute most to the model: wt, qsec and am. With this new information we can optimize our model:
optimized.model <- lm(mpg ~ factor(am): wt+ factor(am):qsec, data=mtcars)
summary(optimized.model)
##
## Call:
## lm(formula = mpg ~ factor(am):wt + factor(am):qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9361 -1.4017 -0.1551 1.2695 3.8862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.9692 5.7756 2.419 0.02259 *
## factor(am)Automatic:wt -3.1759 0.6362 -4.992 3.11e-05 ***
## factor(am)Manual:wt -6.0992 0.9685 -6.297 9.70e-07 ***
## factor(am)Automatic:qsec 0.8338 0.2602 3.205 0.00346 **
## factor(am)Manual:qsec 1.4464 0.2692 5.373 1.12e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.097 on 27 degrees of freedom
## Multiple R-squared: 0.8946, Adjusted R-squared: 0.879
## F-statistic: 57.28 on 4 and 27 DF, p-value: 8.424e-13
Our optimized model presents a value for R-Squared of: 0.89, higher value than the initial model
par(mfrow=c(2,2))
plot(optimized.model)
4.1) The results indicate that statistically the automatic type transmission has a lower fuel consumption, compared to the manual type transmission.
4.2) The R-Squared value for the optimized model indicates that it is a good model, a reliable model
4.3) The type of transmission is not the only variable that determines fuel consumption. Fuel consumption is also determined by acceleration and the weight of the car