This analysis answers the following questions about the mtcars dataset in R using regression models and exploratory data analyses:
Is an automatic or manual transmission better for MPG?
Quantify the MPG difference between automatic and manual transmissions
First, a linear regression between MPG and transmission was performed. This showed that cars with manual transmissions get 7.245 MPGs more than automatic, on average. Next, a multivariate linear regression between MPG and transmission, horsepower, and weight was performed. By taking these variables into account, it was shown that manual transmission cars get 2.084 MPGs more than automatic, on average. This analysis shows that we can be reasonably certain that manual transmissions are better for MPG.
mpg <- mtcars$mpg
am <- mtcars$am
am <- as.factor(am)
levels(am) <- c("Automatic", "Manual")
head(mtcars, 2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
The most significant variables in this dataset: mpg: Miles/(US) gallon; am: Transmission (0 = automatic, 1 = manual).
Additional Variables: cyl: Number of cylinders; disp: Displacement (cu.in.); hp: Gross horsepower; drat: Rear axle ratio; wt: Weight (1000 lbs); qsec: 1/4 mile time; vs: V/S; gear: Number of forward gears; carb: Number of carburetors.
The two plots shown in the Appendix illustrate that the distribution of mpg is approximately normal, homoscedastic, and is unskewed.
\(\hat{y} = \beta_0 + \beta_1 x\)
The boxplot shown in the appendix illustrates that manual transmissions are markedly better for MPG in a linear regression between MPG and transmission type.
fit <- lm(mpg~am)
summary(fit)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## amManual 7.244939 1.764422 4.106127 2.850207e-04
A linear regression model shows us:
Cars with manual transmissions get 7.245 MPGs more than automatic, on average.
Because the p-value for am is less than the significance level (p-value = 2.85e-04 < 0.05 = \(\alpha\)), we reject the null hypothesis, and require a multivariate linear regression.
To determine which predictors to include in our model, we use forward selection using p-values:
At 2.92e-08, hp is the smallest p-value, so we add it to our model.
At 3.57e-03, wt is the smallest p-value, so we add it to our model.
None of the remaining predictors can be added to the model (they all have a p-value > 0.05), so we stop adding variables.
The plot shown in the Appendix illustrates that the distribution of mpg is approximately normal and homoscedastic. Thus, our model is a reasonable fit and we can conclude our analysis:
fit2 <- lm(mpg~am+hp+wt, data=mtcars)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ am + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## am 2.083710 1.376420 1.514 0.141268
## hp -0.037479 0.009605 -3.902 0.000546 ***
## wt -2.878575 0.904971 -3.181 0.003574 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
##
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
##
## mpg
a <- ggplot(mtcars, aes(mpg))
a <- a + geom_freqpoly(bins = 30, colour="red") + ggtitle("Frequency Polygon of MPG")
a
n = length(mpg)
probabilities = (1:n)/(n+1)
normal.quantiles = qnorm(probabilities, mean(mpg), sd(mpg))
plot(sort(normal.quantiles), sort(mpg), pch = 19, col = "blue", xlab = 'Theoretical',
ylab = 'Sample', main = 'QQ-Plot of MPG')
abline(0,1)
boxplot(mpg~am, main = "Effect of Transmission Type on MPG", xlab = "Transmission",
ylab = "Miles per Gallon")
par(mfrow = c(2,2))
plot(fit2)