Executive Summary

We are interested in exploring the relationship between a set of variables and miles per gallon (MPG). We are particularly interested in the following two questions:

To condense the report, code used for the analysis can be found in Appendix A.

Exploratory Analysis

The data is part of the base R programming package. All that needs to be done is load in the mtcars data. The head function can be used to get a quick glimpse of the data.

               mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

There are 11 variables with 32 observations. The descriptions of the variables can be found from the ?mtcars command.

Mpg is to be tested against other variables. First, the correlation between mpg and the remaining variables can be determined. Since transmission is listed as a numeric class, it must be forced to a factor. The levels can then be adjusted to their proper labels.

           cyl       disp         hp      drat         wt     qsec
[1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
            vs        am      gear       carb
[1,] 0.6640389 0.5998324 0.4802848 -0.5509251

This shows number of cylinders, displacement, horsepower, weight, and number of carburetors have a negative correlation whereas rear axel ratio, V/S, and Transmission have a positive correlation. Since automatic transmission is defined as 1, the positive correlation relationship states mpg goes up with automatic transmission. A test needs to be done to determine the confidence and quantify the difference.

With only 32 observations, a t-test is a better method to test the hypothesis. A t-test is ran using a 95% confidence interval.


    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group Automatic    mean in group Manual 
               17.14737                24.39231 

Since the p-value is lower than .05, it is determined that the difference between automatic and manual transmission is statistically significant. A visualization of the difference is shown in Appendix B. However, to quantify the difference, a regression analysis needs to be ran.

Regression Analyses

The first step is to fit the mpg to a linear regression model against the remaining variables. This will determine which variables are stastically significant when comparing to mpg.


Call:
lm(formula = mpg ~ ., data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4506 -1.6044 -0.1196  1.2193  4.6271 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 12.30337   18.71788   0.657   0.5181  
cyl         -0.11144    1.04502  -0.107   0.9161  
disp         0.01334    0.01786   0.747   0.4635  
hp          -0.02148    0.02177  -0.987   0.3350  
drat         0.78711    1.63537   0.481   0.6353  
wt          -3.71530    1.89441  -1.961   0.0633 .
qsec         0.82104    0.73084   1.123   0.2739  
vs           0.31776    2.10451   0.151   0.8814  
amManual     2.52023    2.05665   1.225   0.2340  
gear         0.65541    1.49326   0.439   0.6652  
carb        -0.19942    0.82875  -0.241   0.8122  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared:  0.869, Adjusted R-squared:  0.8066 
F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

Looking at the Pr(>|t|) column, it appears wt is the only variable that is close to being statistically significant. Not all variables need to be included. Luckily, R has a function that will do all of the heavy lifting for us to determine which model is the best fit: step().


Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4811 -1.5555 -0.7257  1.4110  4.6610 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6178     6.9596   1.382 0.177915    
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
amManual      2.9358     1.4109   2.081 0.046716 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Based on the step function, weight, quarter mile time, and transmission type are the most important variables with weight and quarter mile time being the most statistically significant. This analysis shows automatic transmission adds roughly 2.936 miles per gallon compared to manual transmission. Appendix B details the residuals of the step model

Final Results

It has been determiend with 95% confidence that manual transmission is better than automatic transmission by 2.936 miles per gallon on average. While exploring the analyses, it also showed that weight and quarter mile time have a larger effect on mpg.

Appendices

Appendix A: Code Used

# Data Workup
data(mtcars)
head(mtcars,3)
# Exploratory Analysis
cor(mtcars$mpg,mtcars[,-1])
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("Automatic","Manual")
# Confidence Testing
t.test(mtcars$mpg~mtcars$am,conf.level=.95)
# Initial Linear Model
summary(lm(data =mtcars,mpg~.))
# Step Model
summary(step(lm(data = mtcars, mpg ~ .), trace=0))

Appendix B: Box Plot of Mpg vs Transmission Types

boxplot(mtcars$mpg ~ mtcars$am, data = mtcars, xlab = "Transmission Type", 
        ylab = "MPG", main = "Miles Per Gallon Vs. Transmission Type" )

Appendix C: Step Model Residuals

par(mfrow=c(2,2))
plot(step(lm(data = mtcars, mpg ~ .), trace=0))