Executive Summary

This is an analysis of the relationship between transmission and miles per gallon (mpg) using the Motorcar Trends dataset. Upon initial linear regression analysis, it appears that manual transmission is better for mpg, with 7.2 mpg more than automatics (95% confidence interval: 3.6-10.8 mpg). However, when adjusting for the relationships with number of cylinders and car weight, there is no significant relationship between transmission and mpg, with an estimated .2 mpg more for manual cars than for automatics (95% confidence interval: -2.5-2.8 mpg) at the mean weight and 4 cylinder cars.

Exploring Data

From this data exploration it looks like manual transmission will be better for mpg. However, number of cylinders, gear, and weight might also influence this relationship (see Appendix).

Running Several Models

m1 <- lm(mpg ~ am, mtcars)
m2 <- lm(mpg ~ am + factor(cyl), mtcars)
m3 <- lm(mpg ~ am + factor(cyl) + wt, mtcars)
m4 <- lm(mpg ~ am + factor(cyl) + wt + gear, mtcars)
#anova(m1, m2, m3, m4) #m3 is best model (see appendix)

Plotting Diagnostics

From anova analysis (in Appendix) it looks like model 3 is the best fitting model because it fits better than model 2, but model 4 does not fit the data better than model 3. Residuals vs. fitted values look uncorrelated and the qqplot shows a mostly linear line, with some deviation for large residuals. This model also explains 84% of the variance (R-squared = .84).

Estimating Coefficients and Uncertainty

summary(m3)$coef[2,1]
## [1] 0.1501031
confint(m3)[2,]
##     2.5 %    97.5 % 
## -2.517734  2.817941
#see Executive Summary for interpretation

Appendix

#other factors that might be affect the relationship between transmission and mpg
table(mtcars$am, mtcars$cyl) #looks like automatic cars have more cylinders?
##    
##      4  6  8
##   0  3  4 12
##   1  8  3  2
table(mtcars$gear, mtcars$am) #looks like manual cars have more forward gears?
##    
##      0  1
##   3 15  0
##   4  4  8
##   5  0  5
summary(mtcars$wt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.513   2.581   3.325   3.217   3.610   5.424
boxplot(wt~am,data=mtcars, main="Car Weight Data",
        xlab="Transmission, 0=auto 1=manual", ylab="Weight (1000 lbs)") 

#automatic cars are heavier

#summary(m1) #looks like manual models get 7.245mpg more than automatics, p=.000285
summary(m1)$coef[2,1]
## [1] 7.244939
confint(m1)[2,] #to assess uncertainty
##    2.5 %   97.5 % 
##  3.64151 10.84837
#summary(m2) #now transmission doesn't matter as much
summary(m3) #again transmission is not significant 
## 
## Call:
## lm(formula = mpg ~ am + factor(cyl) + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4898 -1.3116 -0.5039  1.4162  5.7758 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.7536     2.8135  11.997  2.5e-12 ***
## am             0.1501     1.3002   0.115  0.90895    
## factor(cyl)6  -4.2573     1.4112  -3.017  0.00551 ** 
## factor(cyl)8  -6.0791     1.6837  -3.611  0.00123 ** 
## wt            -3.1496     0.9080  -3.469  0.00177 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.603 on 27 degrees of freedom
## Multiple R-squared:  0.8375, Adjusted R-squared:  0.8134 
## F-statistic: 34.79 on 4 and 27 DF,  p-value: 2.73e-10
#summary(m4) #gear not significant

#model selection with anova
anova(m1, m2, m3, m4) #m3 best model
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + factor(cyl)
## Model 3: mpg ~ am + factor(cyl) + wt
## Model 4: mpg ~ am + factor(cyl) + wt + gear
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     28 264.50  2    456.40 33.3276 6.689e-08 ***
## 3     27 182.97  1     81.53 11.9067  0.001923 ** 
## 4     26 178.03  1      4.94  0.7217  0.403344    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#additional residual plots
par(mfrow = c(1,2))
plot(m3, which=3) #fitted values vs. sqrt standardized residuals
plot(m3, which=5) #residuals vs. leverage