Hossam saad
july 3 ,2020
In this report, we will examine the mtcars data set and explore how miles per gallon (MPG) is affected by different variables and we will answer the following two questions: (1) Is an automatic or manual transmission better for MPG, and (2) Quantify the MPG difference between automatic and manual transmissions.
From our analysis we can show that manual transmission has an MPG 1.8 greater than an automatic transmission.
#display data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
we will transform variables into factors
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c("Automatic","Manual"))
#data after trasformation
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 Automatic 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 Automatic 3 1
now we'll build exploratory plots. Plot 1, shows there is a definite impact on MPG by transmission with Automatic transmissions having a lower MPG.
We’ve visually seen that automatic is better for MPG, but we will now quantify his difference.
aggregate(mpg~am, data = mtcars, mean)
## am mpg
## 1 Automatic 17.14737
## 2 Manual 24.39231
We noted that the automatic cars have an MPG 7.25 lower than manual cars. We will use a t-test to determine if this is a significant difference.
#specific the factor of am varibales only
Automatic_car <- mtcars[mtcars$am == "Automatic",]
Manual_car <- mtcars[mtcars$am == "Manual",]
t.test(Automatic_car$mpg, Manual_car$mpg)
##
## Welch Two Sample t-test
##
## data: Automatic_car$mpg and Manual_car$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
We note the p-value is 0.001374, so this is a significant difference. Now to quantify this.
fiting_SimpleReg <- lm(mpg ~ am, data = mtcars)
summary(fiting_SimpleReg)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
This shows us that the average MPG for automatic is 17.1 MPG, while manual is 7.2 MPG higher and the R2 value is 0.36 that telling us this model only explains us 36% of the variance. As a result, we need to build a multivariate linear regression.
The new model will use the other variables to make it more accurate. We explore the other variable via a pairs plot (Plot 2) to see how all the variables correlate with mpg. From this we see that cyl, disp, hp, wt have the strongest correlation with mpg. We build a new model using these variables and compare them to the initial model with the anova function.
Fiting_MultipuleReg <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
anova(fiting_SimpleReg, Fiting_MultipuleReg)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + disp + hp + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 25 150.41 5 570.49 18.965 8.637e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This results in a p-value of 8.637e-08, and we can claim the Fiting_MultipuleReg model is significantly better than our fiting_SimpleReg model. We double-check the residuals for non-normality (Plot 3) and can see they are all normally distributed and homoskedastic.
summary(Fiting_MultipuleReg)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9374 -1.3347 -0.3903 1.1910 5.0757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.864276 2.695416 12.564 2.67e-12 ***
## amManual 1.806099 1.421079 1.271 0.2155
## cyl6 -3.136067 1.469090 -2.135 0.0428 *
## cyl8 -2.717781 2.898149 -0.938 0.3573
## disp 0.004088 0.012767 0.320 0.7515
## hp -0.032480 0.013983 -2.323 0.0286 *
## wt -2.738695 1.175978 -2.329 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.453 on 25 degrees of freedom
## Multiple R-squared: 0.8664, Adjusted R-squared: 0.8344
## F-statistic: 27.03 on 6 and 25 DF, p-value: 8.861e-10
The model explains 86.64% of the variance and as a result, cyl, disp, hp, wt did affect the correlation between mpg and am. Thus, we can say the difference between automatic and manual transmissions is 1.81 MPG.
Plot 1- Boxplot of MPG by transmission type
boxplot(mpg ~ am, data = mtcars, col = (c("red","green")), ylab = "Miles Per Gallon", xlab = "Transmission Type")
Plot 2- Pairs plot for the data set
pairs(mpg ~ ., data = mtcars)
Plot 3 - Check residuals
par(mfrow = c(2,2))
plot(Fiting_MultipuleReg)