Executive Summary

This analysis answers the following questions about the mtcars dataset in R using regression models and exploratory data analyses:

First, a linear regression between MPG and transmission was performed. This showed that cars with manual transmissions get 7.245 MPGs more than automatic, on average. Next, a multivariate linear regression between MPG and transmission, horsepower, and weight was performed. By taking these variables into account, it was shown that manual transmission cars get 2.084 MPGs more than automatic, on average. This analysis shows that we can be reasonably certain that manual transmissions are better for MPG.

Data

mpg <- mtcars$mpg
am <- mtcars$am
am <- as.factor(am)
levels(am) <- c("Automatic", "Manual")
head(mtcars, 2)
##               mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

The most significant variables in this dataset: mpg: Miles/(US) gallon; am: Transmission (0 = automatic, 1 = manual).

Additional Variables: cyl: Number of cylinders; disp: Displacement (cu.in.); hp: Gross horsepower; drat: Rear axle ratio; wt: Weight (1000 lbs); qsec: 1/4 mile time; vs: V/S; gear: Number of forward gears; carb: Number of carburetors.

Exploratory Data Analysis

The two plots shown in the Appendix illustrate that the distribution of mpg is approximately normal, homoscedastic, and is unskewed.

Linear regression between 2 variables: mpg and am

\(\hat{y} = \beta_0 + \beta_1 x\)

The boxplot shown in the appendix illustrates that manual transmissions are markedly better for MPG in a linear regression between MPG and transmission type.

fit <- lm(mpg~am)
summary(fit)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## amManual     7.244939   1.764422  4.106127 2.850207e-04

A linear regression model shows us:

Multivariate Linear Regression

\(\hat{y} = \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k\)

To determine which predictors to include in our model, we use forward selection using p-values:

  1. At 2.92e-08, hp is the smallest p-value, so we add it to our model.

  2. At 3.57e-03, wt is the smallest p-value, so we add it to our model.

  3. None of the remaining predictors can be added to the model (they all have a p-value > 0.05), so we stop adding variables.

The plot shown in the Appendix illustrates that the distribution of mpg is approximately normal and homoscedastic. Thus, our model is a reasonable fit and we can conclude our analysis:

fit2 <- lm(mpg~am+hp+wt, data=mtcars)
summary(fit2)
## 
## Call:
## lm(formula = mpg ~ am + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## am           2.083710   1.376420   1.514 0.141268    
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

Conclusions

Appendix

Exploratory Data Analysis

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
## 
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
## 
##     mpg
a <- ggplot(mtcars, aes(mpg))
a <- a + geom_freqpoly(bins = 30, colour="red") + ggtitle("Frequency Polygon of MPG")
a

n = length(mpg)
probabilities = (1:n)/(n+1)
normal.quantiles = qnorm(probabilities, mean(mpg), sd(mpg))
plot(sort(normal.quantiles), sort(mpg), pch = 19, col = "blue", xlab = 'Theoretical', 
     ylab = 'Sample', main = 'QQ-Plot of MPG')
abline(0,1)

Linear regression between 2 variables: mpg and am

boxplot(mpg~am, main = "Effect of Transmission Type on MPG", xlab = "Transmission", 
        ylab = "Miles per Gallon")

Multivariate Linear Regression

par(mfrow = c(2,2))
plot(fit2)