Executive Summary

An exploratory analysis that investigates the conditional effect of transmission type—specifically manual versus automatic—on vehicular fuel efficiency (\(MPG\)), using the foundational mtcars dataset. An initial, unadjusted two-sample \(t\)-test indicates a significant baseline disparity: manual vehicles exhibit a \(7.25\text{ MPG}\) advantage over their automatic counterparts (\(p < 0.05\)). However, this bivariate relationship suffers from acute omitted variable bias.To isolate the true causal effect, we employed a stepwise backward elimination strategy to construct an optimized multivariate linear regression model. Upon controlling for key confounding variables—specifically vehicle weight (\(wt\)), gross horsepower (\(hp\)), and cylinder count (\(cyl\))—the apparent effect size of a manual transmission attenuates drastically, yielding a marginal, less statistically significant improvement of just \(1.81\text{ MPG}\). The final model demonstrates that vehicular weight and powertrain geometry dominate the variance in fuel efficiency (\(R^2 = 0.866\)), concluding that the initial raw advantage attributed to transmission type is largely driven by underlying structural differences in the vehicles themselves.

Load the ggplot library and viewing the cars dataset.

library(ggplot2)
View(mtcars)

Exploratory Analysis

See Appendix Figure I Exploratory scatter plot that compares Automatic and Manual transmission MPG. The graph leads us to believe that there is a significant increase in MPG when for vehicles with a manual transmission vs automatic.

Statistical Inference

T-Test on miles per gallon by transmission(manual, automatic)

testResults <- t.test(mpg ~ am, mtcars)
testResults$p.value
## [1] 0.001373638

The T-Test rejects the null hypothesis that the difference between transmission types is 0.

testResults$estimate
## mean in group 0 mean in group 1 
##        17.14737        24.39231

The difference estimate between the 2 transmissions is 7.24494 MPG in favor of manual.

Regression Analysis

Linear model analysis for all coefficients

all_model_reg <- lm(mpg ~ ., data = mtcars)
summary(all_model_reg)  
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

Since none of the coefficients have a p-value less than 0.05 we cannot conclude which variables are more statistically significant.

Backward selection to determine which variables are most statistically significant

stepFit <- step(all_model_reg)
summary(stepFit)

The new model has 4 variables (cylinders, horsepower, weight, transmission). The R-squared value of 0.8659 confirms that this model explains about 87% of the variance in MPG. The p-values are also statistically significant because they are less than 0.05. The coefficients conclude that, holding all other variables constant: Cylinders: A 6-cylinder engine decreases MPG by 3.03 compared to a 4-cylinder engine, while an 8-cylinder engine decreases MPG by 2.16 compared to a 4-cylinder engine. Horsepower: Every 1-unit increase in horsepower decreases MPG by 0.0321 (or a 3.21 drop per 100 horsepower). Weight: Every 1,000 lbs increase in weight decreases MPG by 2.5. Transmission: A manual transmission improves the MPG by 1.81 compared to an automatic transmission..

Residuals & Diagnostics

Residual Plot See Appendix Figure II

The plots conclude:

  1. The randomness of the Residuals vs. Fitted plot supports the assumption of independence
  2. The points of the Normal Q-Q plot following closely to the line conclude that the distribution of residuals is normal
  3. The Scale-Location plot random distribution confirms the constant variance assumption
  4. Since all points are within the 0.05 lines, the Residuals vs. Leverage concludes that there are no outliers
sum((abs(dfbetas(stepFit)))>1)
## [1] 1

Conclusion

There is a difference in MPG based on transmission type. A manual transmission will have a slight MPG boost. However, it seems that weight, horsepower, & number of cylinders are more statistically significant when determining MPG.

Appendix Figure I

Appendix Figure II