Executive Summary

While working for Motor Trend magazine I was tasked with the project of findig out which motor type is better for MPG. Either manual transmission motors or automatic motors. For this project I’ll be using the MTCARS dataset. We will be doing some exploratory analysis to check multiple variables

Synopsis

You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

  1. “Is an automatic or manual transmission better for MPG”
  2. “Quantify the MPG difference between automatic and manual transmissions”

Analysis

List of variables used

mpg = Miles/gallon wt = weight per/1000lbs qsec = 1/4 mile time

Exploratory Data Analysis

library(ggplot2)
data(mtcars)
mtcars[1:3, ]
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
library(data.table)
library(scales)
library(grid)
library(gridExtra)
library(MASS)

The histogram needs to determine if the MPG is normally distributed to be able to run a inference and regression analysis on it. The distribution is approx normal. Box plots were used to determine if the medians are similar.

Regression, ANOVA and Residual Analysis

Linear Model

Intercept = 17.15 and Slope = 7.24 With an auto transmission MPG average starts at 17.15 and if the vehicle is manual you could estimate in an increase of 7.24 MPG. Adjusted R square = 33.9% P-value = .0002 so this model is significant

Multi-Variate Regression

81% of this variation is based on the p-value so this model isn’t significant and should be fit again

ANOVA Test

We must test whether there are significant differences between the models * Based on the F-statistic and the p-value we can reject the null hypothesis indicating that the models are different and the addition of weight and acceleration do effect MPG. We can conclude that when we hold the weight of a car and the acceleration constant, Manual Transmissions can increase MPG efficiency by an average of 2.94 mpg.

library(ggplot2)
data("mtcars")
hist(mtcars$mpg, breaks=12, xlab="Miles Per Gallon (MPG)", main="MPG Distribution", col="yellow")

library(ggplot2)
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("AUTO", "MAN")

g <- ggplot(aes(x = am, y = mpg), data = mtcars)  
g <- g + geom_boxplot(aes(fill = am))
g + labs(x = "Transmission Type", y = "MPG", title = "MPG by Transmission Type") +
        theme(plot.title = element_text(color="Orange", face="bold",hjust=0.5)) +
        scale_fill_manual(values=c("red","Blue"))

Question 1 Answer

The inference analysis sample t test statistics indicate that we’d Reject the Null hypothesis, That the means are not the same. It’s been determined that the MPG’s are greater in a Manual Transmission car rather than a Automatic transmission car.

Question 2 Answer

The results of this model suggests that fuel efficiency is higher in manual cars than in automatic cars by around 3mpg.

Multi Regression
fit2 <- lm(mpg ~ . , data = mtcars)
Linear Regression
fit <- lm(mpg ~ am, data = mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amMAN          7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
Multi Regression
fit2 <- lm(mpg ~ . , data = mtcars)
Stepwise Regression
stepwise <- stepAIC(fit2, direction="both", trace=FALSE)
summary(stepwise)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amMAN         2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11
Some Residual Plots for ya
par(mfrow = c(2,2))
plot(stepwise)