Executive Summary

This regression analysis has been performed using the mtcars dataset for the “project” objective of being a Motor Trend magazine employee. This summary reviews the relationship between the set of variables for vehicles and miles per gallon (MPG) as the outcome variable of interest. This project investigates two primary questions:

  1. “Is an automatic or manual transmission better for the MPG?”
  2. “Quantify the MPG difference between the automatic and the manual transmissions.”

Key Findings

From the regression analysis performed, we find that the manual transmission vehicles have 7.245 more MPG versus the automatic transmission. We also find that the \(R^2\) value is 0.3598…indicating that our model only explains 35.98% of the variance. Requiring further analysis. In addition, we find that the best model performed in the exploratory appendix section, indicates that the mpg ~ wt + qsec + am model is the best model.

Reading in the data: description and transformations

The data is read into R and the data transformations by factoring for necesary variables to investigate the regression analysis.

data(mtcars)
head(mtcars,4)
##                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1

Transformations

# create factors with value labels
mtcars$am <- factor(mtcars$am,levels=c(0,1),
   labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
   labels=c("4cyl","6cyl","8cyl")) 
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
   labels=c("3gears","4gears","5gears"))

Descriptive Statistics

summary(mtcars)
##       mpg         cyl          disp             hp             drat     
##  Min.   :10.4   4cyl:11   Min.   : 71.1   Min.   : 52.0   Min.   :2.76  
##  1st Qu.:15.4   6cyl: 7   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.08  
##  Median :19.2   8cyl:14   Median :196.3   Median :123.0   Median :3.69  
##  Mean   :20.1             Mean   :230.7   Mean   :146.7   Mean   :3.60  
##  3rd Qu.:22.8             3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.92  
##  Max.   :33.9             Max.   :472.0   Max.   :335.0   Max.   :4.93  
##        wt            qsec            vs                am         gear   
##  Min.   :1.51   Min.   :14.5   Min.   :0.000   Automatic:19   3gears:15  
##  1st Qu.:2.58   1st Qu.:16.9   1st Qu.:0.000   Manual   :13   4gears:12  
##  Median :3.33   Median :17.7   Median :0.000                  5gears: 5  
##  Mean   :3.22   Mean   :17.8   Mean   :0.438                             
##  3rd Qu.:3.61   3rd Qu.:18.9   3rd Qu.:1.000                             
##  Max.   :5.42   Max.   :22.9   Max.   :1.000                             
##       carb     
##  Min.   :1.00  
##  1st Qu.:2.00  
##  Median :2.00  
##  Mean   :2.81  
##  3rd Qu.:4.00  
##  Max.   :8.00

Exploratory Plots using ggplot2

Included in the appendix are various graphics. Figure 1, the boxplot of the varialbes mpg when am is either Automatic or Manual. The plot indicates that an increase in mpg occurs when the transmission is manual.

Regression Analysis

The first step is to run a correlation analysis using all 11 variables and mpg values.

data(mtcars)
cor.out <- sort(cor(mtcars)[,1])
round(cor.out, 3)
wt    cyl   disp     hp   carb   qsec   gear     am     vs   drat 

-0.868 -0.852 -0.848 -0.776 -0.551 0.419 0.480 0.600 0.664 0.681 mpg 1.000

The second step is to run a t-test analysis using the variables of interest.

t_mt <- t.test(mpg ~ am, data = mtcars)
t_mt
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.28  -3.21
## sample estimates:
## mean in group 0 mean in group 1 
##           17.15           24.39

The third step is to run a simple regression analysis using the variables of interest. We are using AIC criteria for the best model selection and investigating outliers in the plots provided in the appendix.

reg.1 <- lm( mpg ~ factor(am), data=mtcars)
summary(reg.1)
## 
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.392 -3.092 -0.297  3.244  9.508 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    17.15       1.12   15.25  1.1e-15 ***
## factor(am)1     7.24       1.76    4.11  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared:  0.36,   Adjusted R-squared:  0.338 
## F-statistic: 16.9 on 1 and 30 DF,  p-value: 0.000285

Final model quantifying the mpg difference

best <- lm(mpg ~ am + wt + qsec, data = mtcars)
anova(reg.1, best)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ am + wt + qsec
##   Res.Df RSS Df Sum of Sq    F  Pr(>F)    
## 1     30 721                              
## 2     28 169  2       552 45.6 1.6e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Appendix

Description of 11 variables

Position Variable Code Variable Name Description
[,1] mpg Miles/(US) gallon
[,2] cyl Number of cylinders
[,3] disp Displacement (cubic inches)
[,4] hp Gross horsepower
[,5] drat Rear axle ratio
[,6] wt Weight (lb/1000)
[,7] qsec 1/4 mile time
[,8] vs V/S
[,9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

Exploratory Plots using ggplot2

Figure 1

library(ggplot2) 
# Boxplots of mpg by transmission style
# observations (points) are overlayed and jittered
qplot(am, mpg, data=mtcars, geom=c("boxplot", "jitter"),
   fill=factor(am),  main="Mileage by Transmission Type",
   xlab="", ylab="Miles per Gallon") 

plot of chunk plot1

Figure 2

# Separate regressions of mpg on weight for each number of cylinders
qplot(factor(am), mpg, data=mtcars, geom=c("point", "smooth"),
   method="lm", formula=y~x, color=am,
   main="Regression of MPG on Transmission Type",
   xlab="Transmission", ylab="Miles per Gallon")
## geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?

plot of chunk plot2

Residual plots

par(mfrow = c(2,2))
plot(best)

plot of chunk plotres

Additional Findings and Recomendations

one_model <- lm(mpg ~ ., data = mtcars)
find.model <- step(one_model, direction = "both")
## Start:  AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq RSS  AIC
## - cyl   1      0.08 148 68.9
## - vs    1      0.16 148 68.9
## - carb  1      0.41 148 69.0
## - gear  1      1.35 149 69.2
## - drat  1      1.63 149 69.2
## - disp  1      3.92 151 69.7
## - hp    1      6.84 154 70.3
## - qsec  1      8.86 156 70.8
## <none>              148 70.9
## - am    1     10.55 158 71.1
## - wt    1     27.01 174 74.3
## 
## Step:  AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq RSS  AIC
## - vs    1      0.27 148 67.0
## - carb  1      0.52 148 67.0
## - gear  1      1.82 149 67.3
## - drat  1      1.98 150 67.3
## - disp  1      3.90 152 67.7
## - hp    1      7.36 155 68.5
## <none>              148 68.9
## - qsec  1     10.09 158 69.0
## - am    1     11.84 159 69.4
## + cyl   1      0.08 148 70.9
## - wt    1     27.03 175 72.3
## 
## Step:  AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
## 
##        Df Sum of Sq RSS  AIC
## - carb  1      0.69 148 65.1
## - gear  1      2.14 150 65.4
## - drat  1      2.21 150 65.4
## - disp  1      3.65 152 65.8
## - hp    1      7.11 155 66.5
## <none>              148 67.0
## - am    1     11.57 159 67.4
## - qsec  1     15.68 164 68.2
## + vs    1      0.27 148 68.9
## + cyl   1      0.19 148 68.9
## - wt    1     27.38 175 70.4
## 
## Step:  AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
## 
##        Df Sum of Sq RSS  AIC
## - gear  1       1.6 150 63.5
## - drat  1       1.9 150 63.5
## <none>              148 65.1
## - disp  1      10.1 159 65.2
## - am    1      12.3 161 65.7
## - hp    1      14.8 163 66.2
## + carb  1       0.7 148 67.0
## + vs    1       0.4 148 67.0
## + cyl   1       0.4 148 67.0
## - qsec  1      26.4 175 68.4
## - wt    1      69.1 218 75.3
## 
## Step:  AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq RSS  AIC
## - drat  1       3.3 153 62.2
## - disp  1       8.5 159 63.2
## <none>              150 63.5
## - hp    1      13.3 163 64.2
## + gear  1       1.6 148 65.1
## + cyl   1       1.0 149 65.2
## + vs    1       0.6 149 65.3
## + carb  1       0.1 150 65.4
## - am    1      20.0 170 65.5
## - qsec  1      25.6 176 66.5
## - wt    1      67.6 218 73.4
## 
## Step:  AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq RSS  AIC
## - disp  1       6.6 160 61.5
## <none>              153 62.2
## - hp    1      12.6 166 62.7
## + drat  1       3.3 150 63.5
## + gear  1       3.0 150 63.5
## + cyl   1       2.4 151 63.6
## + vs    1       1.1 152 63.9
## + carb  1       0.0 153 64.2
## - qsec  1      26.5 180 65.3
## - am    1      32.2 186 66.3
## - wt    1      69.0 222 72.1
## 
## Step:  AIC=61.52
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq RSS  AIC
## - hp    1       9.2 169 61.3
## <none>              160 61.5
## + disp  1       6.6 153 62.2
## + carb  1       3.2 157 62.9
## + drat  1       1.4 159 63.2
## - qsec  1      20.2 180 63.3
## + cyl   1       0.2 160 63.5
## + vs    1       0.2 160 63.5
## + gear  1       0.2 160 63.5
## - am    1      26.0 186 64.3
## - wt    1      78.5 239 72.3
## 
## Step:  AIC=61.31
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq RSS  AIC
## <none>              169 61.3
## + hp    1       9.2 160 61.5
## + carb  1       8.0 161 61.8
## + disp  1       3.3 166 62.7
## + cyl   1       1.5 168 63.0
## + drat  1       1.4 168 63.0
## + gear  1       0.1 169 63.3
## + vs    1       0.0 169 63.3
## - am    1      26.2 195 63.9
## - qsec  1     109.0 278 75.2
## - wt    1     183.3 353 82.8
summary(find.model)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.481 -1.556 -0.726  1.411  4.661 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.618      6.960    1.38  0.17792    
## wt            -3.917      0.711   -5.51    7e-06 ***
## qsec           1.226      0.289    4.25  0.00022 ***
## am             2.936      1.411    2.08  0.04672 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.46 on 28 degrees of freedom
## Multiple R-squared:  0.85,   Adjusted R-squared:  0.834 
## F-statistic: 52.7 on 3 and 28 DF,  p-value: 1.21e-11