While working for Motor Trend magazine I was tasked with the project of findig out which motor type is better for MPG. Either manual transmission motors or automatic motors. For this project I’ll be using the MTCARS dataset. We will be doing some exploratory analysis to check multiple variables
You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
mpg = Miles/gallon wt = weight per/1000lbs qsec = 1/4 mile time
library(ggplot2)
data(mtcars)
mtcars[1:3, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
library(data.table)
library(scales)
library(grid)
library(gridExtra)
library(MASS)
The histogram needs to determine if the MPG is normally distributed to be able to run a inference and regression analysis on it. The distribution is approx normal. Box plots were used to determine if the medians are similar.
Intercept = 17.15 and Slope = 7.24 With an auto transmission MPG average starts at 17.15 and if the vehicle is manual you could estimate in an increase of 7.24 MPG. Adjusted R square = 33.9% P-value = .0002 so this model is significant
81% of this variation is based on the p-value so this model isn’t significant and should be fit again
We must test whether there are significant differences between the models * Based on the F-statistic and the p-value we can reject the null hypothesis indicating that the models are different and the addition of weight and acceleration do effect MPG. We can conclude that when we hold the weight of a car and the acceleration constant, Manual Transmissions can increase MPG efficiency by an average of 2.94 mpg.
library(ggplot2)
data("mtcars")
hist(mtcars$mpg, breaks=12, xlab="Miles Per Gallon (MPG)", main="MPG Distribution", col="yellow")
library(ggplot2)
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("AUTO", "MAN")
g <- ggplot(aes(x = am, y = mpg), data = mtcars)
g <- g + geom_boxplot(aes(fill = am))
g + labs(x = "Transmission Type", y = "MPG", title = "MPG by Transmission Type") +
theme(plot.title = element_text(color="Orange", face="bold",hjust=0.5)) +
scale_fill_manual(values=c("red","Blue"))
The inference analysis sample t test statistics indicate that we’d Reject the Null hypothesis, That the means are not the same. It’s been determined that the MPG’s are greater in a Manual Transmission car rather than a Automatic transmission car.
The results of this model suggests that fuel efficiency is higher in manual cars than in automatic cars by around 3mpg.
fit2 <- lm(mpg ~ . , data = mtcars)
fit <- lm(mpg ~ am, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amMAN 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
fit2 <- lm(mpg ~ . , data = mtcars)
stepwise <- stepAIC(fit2, direction="both", trace=FALSE)
summary(stepwise)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amMAN 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
par(mfrow = c(2,2))
plot(stepwise)