We are interested in exploring the relationship between a set of variables and miles per gallon (MPG). We are particularly interested in the following two questions:
To condense the report, code used for the analysis can be found in Appendix A.
The data is part of the base R programming package. All that needs to be done is load in the mtcars data. The head function can be used to get a quick glimpse of the data.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
There are 11 variables with 32 observations. The descriptions of the variables can be found from the ?mtcars command.
Mpg is to be tested against other variables. First, the correlation between mpg and the remaining variables can be determined. Since transmission is listed as a numeric class, it must be forced to a factor. The levels can then be adjusted to their proper labels.
cyl disp hp drat wt qsec
[1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
vs am gear carb
[1,] 0.6640389 0.5998324 0.4802848 -0.5509251
This shows number of cylinders, displacement, horsepower, weight, and number of carburetors have a negative correlation whereas rear axel ratio, V/S, and Transmission have a positive correlation. Since automatic transmission is defined as 1, the positive correlation relationship states mpg goes up with automatic transmission. A test needs to be done to determine the confidence and quantify the difference.
With only 32 observations, a t-test is a better method to test the hypothesis. A t-test is ran using a 95% confidence interval.
Welch Two Sample t-test
data: mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group Automatic mean in group Manual
17.14737 24.39231
Since the p-value is lower than .05, it is determined that the difference between automatic and manual transmission is statistically significant. A visualization of the difference is shown in Appendix B. However, to quantify the difference, a regression analysis needs to be ran.
The first step is to fit the mpg to a linear regression model against the remaining variables. This will determine which variables are stastically significant when comparing to mpg.
Call:
lm(formula = mpg ~ ., data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
amManual 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
Looking at the Pr(>|t|) column, it appears wt is the only variable that is close to being statistically significant. Not all variables need to be included. Luckily, R has a function that will do all of the heavy lifting for us to determine which model is the best fit: step().
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4811 -1.5555 -0.7257 1.4110 4.6610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6178 6.9596 1.382 0.177915
wt -3.9165 0.7112 -5.507 6.95e-06 ***
qsec 1.2259 0.2887 4.247 0.000216 ***
amManual 2.9358 1.4109 2.081 0.046716 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Based on the step function, weight, quarter mile time, and transmission type are the most important variables with weight and quarter mile time being the most statistically significant. This analysis shows automatic transmission adds roughly 2.936 miles per gallon compared to manual transmission. Appendix B details the residuals of the step model
It has been determiend with 95% confidence that manual transmission is better than automatic transmission by 2.936 miles per gallon on average. While exploring the analyses, it also showed that weight and quarter mile time have a larger effect on mpg.
# Data Workup
data(mtcars)
head(mtcars,3)
# Exploratory Analysis
cor(mtcars$mpg,mtcars[,-1])
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic","Manual")
# Confidence Testing
t.test(mtcars$mpg~mtcars$am,conf.level=.95)
# Initial Linear Model
summary(lm(data =mtcars,mpg~.))
# Step Model
summary(step(lm(data = mtcars, mpg ~ .), trace=0))
boxplot(mtcars$mpg ~ mtcars$am, data = mtcars, xlab = "Transmission Type",
ylab = "MPG", main = "Miles Per Gallon Vs. Transmission Type" )
par(mfrow=c(2,2))
plot(step(lm(data = mtcars, mpg ~ .), trace=0))