The main goal of this project is to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). Particularly interested in the following two questions:
1. Is an automatic or manual transmission better for MPG
2. Quantify the MPG difference between automatic and manual transmissions
The questions will be addressed using regression models and exploratory data analyses in the following sections.
library(datasets)
data(mtcars)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
Various relationships between variables of interest are plotted in Figure 1 (See Appendix). Boxplot for automatic and manual transmissions versus mpg shows that manual tansmission gives higher average mpg compare to automatic transmission (Figure 2 - See Appendix).
rm_single <- lm(mpg ~ am, data = mtcars)
summary(rm_single)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
It shows a profound difference between the mpg for each type of transmission, where beta0 is the intercept and beta1 is the mean for automatic. Since the p-value is very much less than 0.05, we can reject the null hypothesis that cofounder variables may not contribute to the accuracy of the model. However, adjusted R-squared error predicts that only 33.85% of the regression variance are explained by this model, showing the importance of several other predictor variables. Let us pick the cofounder variables from correlation analysis. Correlation between mpg (miles per gallon) and other variables are given by
data(mtcars)
cor(mtcars)[1,]
## mpg cyl disp hp drat wt
## 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594
## qsec vs am gear carb
## 0.4186840 0.6640389 0.5998324 0.4802848 -0.5509251
It shows that variables cylinders (cyl), weight (wt), displacement (disp), and Gross horsepower(hp) are strongly correlated with mpg. Hence, it is essential to consider these as cofounder variables for our regression analysis.
rm_multi <- lm(mpg ~ cyl + disp + hp + wt + am, data = mtcars)
summary(rm_multi)
##
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5952 -1.5864 -0.7157 1.2821 5.5725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.20280 3.66910 10.412 9.08e-11 ***
## cyl -1.10638 0.67636 -1.636 0.11393
## disp 0.01226 0.01171 1.047 0.30472
## hp -0.02796 0.01392 -2.008 0.05510 .
## wt -3.30262 1.13364 -2.913 0.00726 **
## am 1.55649 1.44054 1.080 0.28984
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared: 0.8551, Adjusted R-squared: 0.8273
## F-statistic: 30.7 on 5 and 26 DF, p-value: 4.029e-10
Adjusted R-squared error shows that 82.73% of the regression variance are explained by this model.
Figure 3 shows the plot for residuals and diagnostics obtained by plotting the object returned by the multivariate regression model (See Appendix).
Boxplot from exploratory data analysis and simple regeression analysis shows that manual transmission has higher MPG compare to automatic transmission.
Manual transmission gives average of 1.55 MPG more than automatic transmission, considering cylinders (cyl), weight (wt), displacement (disp), and Gross horsepower(hp) as cofounding variables.
pairs(mpg ~., data = mtcars, main = "Figure 1: Motor Trend Car Road Tests")
boxplot(mpg ~ am, data = mtcars, col = "bisque",
xlab = "Transmission type", ylab = "Miles Per Gallon (mpg)",
main = "Figure 2: Boxplot showing average mile per gallon")
par(mfrow = c(2,2))
plot(rm_multi, main = "Figure 3: Residuals and Diagnostics")