by Davin Kaing
This report compares the type of transmission, automatic and manual, and their miles per gallon (MPG) from the dataset, Motor Trend Car Road Tests. The comparison is made by considering the variables (number of cylinders, displacement, gross horsepower, rear axle ratio, and weight) that are highly correlated with MPG. The result of this analyses shows that manual transmission has a higher MPG than automatic.
The following explores the correlation of variables in the dataset. The command, ‘ggpair’, was used to plot the correlation of paired variables. A summary of the linear model of the MPG with other variables is also provided. The variance inflation factors were also calculated to look at the correlation between the variables.
data(mtcars)
library(GGally)
ggpairs(mtcars, columns = c(1,2:6), lower = list(Continuous = "smooth"), params = c(method = "loess"))
ggpairs(mtcars, columns = c(1,7:11), lower = list(Continuous = "smooth"), params = c(method = "loess"))
summary(lm(mpg~., data = mtcars))
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
fit <- lm(mpg~., data = mtcars)
library(car)
vif(fit)
## cyl disp hp drat wt qsec vs
## 15.373833 21.620241 9.832037 3.374620 15.164887 7.527958 4.965873
## am gear carb
## 4.648487 5.357452 7.908747
From the exploratory analyses above, models of the created. To identify the best model, the command, anova, is used.
model1 <- lm(mpg~am, data = mtcars)
model2 <- lm(mpg~am+cyl+disp, data = mtcars)
model3 <- lm(mpg~am+cyl+disp+hp+drat, data = mtcars)
model4 <- lm(mpg~am+cyl+disp+hp+drat+wt, data = mtcars)
anova(model1,model2,model3,model4)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + disp
## Model 3: mpg ~ am + cyl + disp + hp + drat
## Model 4: mpg ~ am + cyl + disp + hp + drat + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 252.08 2 468.82 36.0775 4.275e-08 ***
## 3 26 214.50 2 37.58 2.8923 0.074146 .
## 4 25 162.43 1 52.06 8.0130 0.009033 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p values generated from the anova analyses shows that ‘model4’ is the best model.
To observe the difference between the transmission types and their respective MPG, T-test was conducted. The result from T-test shows that the manual transmission has higher MPG than the automatic transmission.
Automatic <- mtcars[mtcars$am == 0,]
Manual <- mtcars[mtcars$am == 1, ]
t.test(Manual$mpg,Automatic$mpg)
##
## Welch Two Sample t-test
##
## data: Manual$mpg and Automatic$mpg
## t = 3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.209684 11.280194
## sample estimates:
## mean of x mean of y
## 24.39231 17.14737
The following boxplot shows that the manual transmission has greater MPG than automatic transmission.
boxplot(Automatic$mpg, Manual$mpg, names = c("Automatic", "Manual"),
col = c("gold", "blue"), xlab = "Type of Transmission", ylab = "MPG", main = "Type of Transmission vs. MPG")
After observing the difference between the types of transmission, a summary of the best model is provided to quantify this difference.
summary(model4)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + drat + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.437 -1.574 -0.688 1.310 5.551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.04938 7.60553 4.740 7.31e-05 ***
## am 1.37506 1.56866 0.877 0.38906
## cyl -1.03335 0.72405 -1.427 0.16590
## disp 0.01257 0.01195 1.052 0.30307
## hp -0.02887 0.01444 -1.999 0.05658 .
## drat 0.48586 1.49495 0.325 0.74788
## wt -3.27472 1.15685 -2.831 0.00903 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.549 on 25 degrees of freedom
## Multiple R-squared: 0.8557, Adjusted R-squared: 0.8211
## F-statistic: 24.72 on 6 and 25 DF, p-value: 2.266e-09
The summary of the best model, ‘model4’, shows that there is an increase in 1.375 of the manual transmission for every the automatic transmission.
The following is the residuals and diagnostics of model4.
par(mfrow = c(2,2));
plot(model4)