Using regression and the mtcars dataset from within R Studio we will give a possible answer to this question.

Summary

The linear regression coefficient is 7.245, suggesting the a manual transmission increases mpg by 7.245 compared to 17.147mpg with an automatic. The nested fit model shows that transmission type does not have a significant effect when combined with the other 10 variables. The general linear model suggests that a manual transmission increases mpg by 3.5%. The ANOVA analysis of the confounding and interaction of transmission type has a p-value of 0.001 and thus the transmission variable is necessary to the model.

Model Selection

The process will start with a nested linear fit model starting with the comparison of mpg versus transmission type and adding each variable one at a time. However, since some of the variables are continuous and some are factors we will use VIF on the continuous type and general linear models (glm) on the factors.

Nested Model Fit

The first model (fit1) is the linear model between mpg versus the factor am. The fit will be updated by adding each additional variable in the following order: factor(cyl), disp, hp, drat, wt, qsec, factor(vs), factor(gear), and then finally factor(carb).

Res.Df RSS Df Sum.of.Sq F Pr..F.
30 720.8966 NA NA NA NA
28 264.4957 2 456.4009213 28.4296590 0.0000079
27 230.4599 1 34.0358025 4.2402467 0.0572763
26 183.0392 1 47.4206687 5.9077595 0.0280914
25 182.3812 1 0.6580252 0.0819781 0.7785511
24 150.1005 1 32.2806726 4.0215892 0.0633096
23 141.2059 1 8.8946071 1.1081075 0.3091566
22 139.0230 1 2.1828581 0.2719447 0.6096443
20 134.0015 2 5.0215145 0.3127950 0.7360567
15 120.4027 5 13.5988573 0.3388344 0.8814442

The ANOVA test suggests that the variable cyl is significant at the 0.001 level; hp is significant at 0.05; disp and wt at the 0.1 level of significance. From this fitting process it suggests that Rear Axle Ratio (drat), 1/4 mile time (qsec), engine shape (vs), number of forward gears (gear), and the number of carburetors do not appear to be necessary or don’t add much to the model.

VIF, or Variance Inflation Factors

We drop the factor variables of transmission type, the number of cylinders, engine shape, number of forward gears, and number of carburetors to determine which of the continuous variables are significant to the model. We do this by taking the square root of the the VIF values. Our criteria will be that any value greater than 2 will be considered significant.

x
disp 3.018
hp 2.281
drat 1.524
wt 2.648
qsec 1.787

From the table we see that disp, hp, and wt each have square root values greater than two.

GLM of the Factor variables

We shall use the general linear model to test if any of the factor variables are significant against mpg. The variable am uses 0 and 1 to represent automatic and manual transmissions, respectively, and we shall use the “binomial” family in the glm. The variable vs is also treated as a binomial as 0 is v-shaped engine and 1 is the straight-line type. Since the number of cylinders (cyl), the number of gears (gears), and the number of carburetors (carb) each take multiple integers to represent different qualities we shall use the family of “Poissson” in the ANOVA analysis.

factor p_value mpg_coef
transmission type 0.00023 1.03502
number of cylinders 0.00045 0.95706
v-shaped or straight-line 0.00002 1.03924
number of gears 0.30889 1.01668
number of carburetors 0.00215 0.94191

From the output in the table we see that transmission is significant and a manual increases mpg by 3.5%. The number of cylinders is significant and increasing the number of cylinders decreases mpg by 4.3%. The vs variable is significant and a straight-line type engine adds about 3.9% to mpg. The number of gears does not appear to be significant. The number of carburetors is significant and for each increase in the number we expect a decrease of around 5.2% in mpg.

Reduced Model

Using the regression techniques described above we find that the nested model fit suggested - and VIF and GLM confirmed - that the number of cylinders, horsepower, displacement, and weight were significant to the model. When we find the subset of mtcars and rerun the linear regression we find:

Estimate Std. Error t value Pr(>|t|)
cyl 5.356 1.418 3.778 0.001
hp -0.031 0.036 -0.871 0.391
disp -0.121 0.023 -5.350 0.000
wt 5.691 2.327 2.446 0.021

When we treated the mpg coefficient as the center of the regression we see that the number of cylinders, the displacement, and the weight are the significant variables.

Is weight a confounder or an interaction?

Since the question was does transmission type have an effect on mpg and the previous results suggested we drop it from the model. We shall test to determine if am is a confounder or an interaction variable to our final model. We will tie this to weight because automatic transmissions tend to weigh more than a manual transmission.

The p-value is 0.00102 which suggests that the interaction between weight and transmission type is necessary in the model.

Plot of transmission type

Appendix: r code

library(datasets)
library(dplyr)
library(ggplot2)
library(car)
library(broom)
library(tibble)
library(tidyr)
library(purrr)
library(knitr)
fit1 <- lm(mpg ~ factor(am), data = mtcars)
fit2 <- update(fit1, mpg ~ factor(am) + factor(cyl))
fit3 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp)
fit4 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp)
fit5 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat)
fit6 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat + wt)
fit7 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat + wt + qsec)
fit8 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat + wt + qsec + factor(vs))
fit9 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat + wt + qsec + factor(vs) + factor(gear))
fit10 <- update(fit1, mpg ~ factor(am) + factor(cyl) + disp + hp + drat + wt + qsec + factor(vs) + factor(gear) + factor(carb))
anova_fit <- data.frame(anova(fit1,fit2,fit3,fit4,fit5,fit6,fit7,fit8,fit9,fit10))
knitr::kable(anova_fit)
df_1 <- mtcars %>% select(mpg | disp:qsec)
fit_continuous <- lm(mpg ~ ., data = df_1)
continuous_vif <- sqrt(vif(fit_continuous))
continuous_vif <- round(continuous_vif,3)
knitr::kable(round(continuous_vif,3))
log_am <- glm(mtcars$am ~ mtcars$mpg, family = "binomial")
am_coef <- summary(log_am)$coefficients
am_exp <- round(exp(coef(lm(log(mtcars$am+1) ~ mtcars$mpg))),5)
am_anova <- anova(log_am, test = "Chisq")
am_p_val <- round(anova(log_am, test = "Chisq")[2,5],5)

log_cyl <- glm(mtcars$cyl ~ mtcars$mpg, family = poisson())
cyl_coef <- summary(log_cyl)$coefficients
cyl_exp <- round(exp(coef(lm(log(mtcars$cyl) ~ mtcars$mpg))), 5) 
cyl_anova <- anova(log_cyl, test = "Chisq")
cyl_p_val <- round(anova(log_cyl, test = "Chisq")[2,5],5)

log_vs <- glm(mtcars$vs ~ mtcars$mpg, family = "binomial")
vs_p_val <- round(anova(log_vs, test = "Chisq")[2,5],5)
vs_exp <- round(exp(coef(lm(log(mtcars$vs+1) ~ mtcars$mpg))),5)

log_gear <- glm(mtcars$gear ~ mtcars$mpg, family = poisson())
gear_exp <- round(exp(coef(lm(log(mtcars$gear) ~ mtcars$mpg))),5)
gear_p_val <- round(anova(log_gear, test = "Chisq")[2,5],5)#not sig

log_carb <- glm(mtcars$carb ~ mtcars$mpg, family = poisson())
carb_coef <- summary(log_carb)$coefficients
carb_exp <- round(exp(coef(lm(log(mtcars$carb) ~ mtcars$mpg))), 5) # predicts a 5% decrease in mpg as the number of carburetors increase
carb_p_val <- round(anova(log_carb, test = "Chisq")[2,5],5)

factors <- c("transmission type", "number of cylinders", "v-shaped or straight-line", "number of gears", "number of carburetors")
p_values <- c(am_p_val, cyl_p_val, vs_p_val, gear_p_val, carb_p_val)
exp_coefficients <- as.matrix(c(am_exp, cyl_exp, vs_exp, gear_exp, carb_exp))[c(2,4,6,8,10),]
factor_results <- data.frame(factor = factors, p_value = p_values, mpg_coef = exp_coefficients)
knitr::kable(factor_results)
sig_mtcars <- mtcars %>% select(mpg| cyl | hp | disp | wt)
final_lm <- round(summary(lm(mpg ~ . -1, data = sig_mtcars))$coefficient,3)
knitr::kable(final_lm)
am_confounder <- lm(mpg ~ factor(am)+wt, data = mtcars)
am_interaction <- lm(mpg ~ factor(am)*wt, data = mtcars)
summary(am_confounder)$coefficient
summary(am_interaction)$coefficient
p_val <- round(anova(am_confounder,am_interaction)[2,6],5)
man_mtcars <- mtcars %>% filter(am == 1)
auto_mtcars <- mtcars %>% filter(am ==0)
man_line <- lm(mpg ~ wt, data = man_mtcars)
auto_line <- lm(mpg ~ wt, data = auto_mtcars)
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(am))) + 
        geom_point(size = 4) +
        labs(x = "Weight (in 1000 lbs)",
             y = "MPG", 
             title = "Regression of MPG v. Weight by Transmission Type") +
        scale_color_manual(name="Transmission",
                           labels=c("Automatic","Manual"),
                           values = c("pink","lightblue")) + 
        geom_abline(intercept = coef(man_line)[1], 
                       slope = coef(man_line)[2],
                       size = 1,
                       colour = "blue") + 
        geom_abline(intercept = coef(auto_line)[1], 
                       slope = coef(auto_line)[2],
                       size = 1,
                       colour = "red")