Executive Summary

This analysis uses the mtcars dataset from the R datasets package to examine the relationship between transmission type and fuel efficiency as measured by miles per gallon (MPG).

The analysis of the difference in MPG associated with manual compared to automatic transmission will identify confounding variables and include them in a multivariable regression model to ensure that the model is measuring the effects of the transmission type and not the effects of potential confounding variables. Model selection techniques were applied to find the best regression model to quantify the difference in MPG based on transmission type.

The null hypothesis is that there is no difference in fuel efficiency between cars with manual transmission compared to cars with automatic transmission. The alternative hypothesis is that cars with manual transmission are more fuel efficient and therefore have a higher MPG value than cars with automatic transmission.

The results of the regression model suggest that there is no statistically significant difference in MPG between cars with automatic transmission compared to cars with manual transmission after adjusting for confounding variables.

Research Questions

  1. After controlling for confounding variables, is there a statistically significant difference in fuel efficiency, measured by differences in miles per gallon (MPG), between cars with manual transmission (MT) compared to cars with automatic transmission (AT)?

  2. Is the model able to quantify the difference in MPG based on the transmission type of a vehicle, manual transmission compared to automatic transmission?

Data Source

The mtcars dataset was collected from a publication of Motor Trends magazine in 1974 and contains the variable mpg which is the response variable and am (automatic transmission value 0 or 1) which is a predictor variable as well as eight other variables that may or may not be confounding variables.

Source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars

Exploratory Data Analysis

Examine the structure of the mtcars dataset

First six records of the mtcars dataset
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1


Examine the distribution of categorical variables in the mtcars dataset


Examine summary statistics for MPG

Table 1: Summary of MPG by transmission type
Source: R mtcars dataset
Transmission N Min Q1 Mean Median Q3 Max Std
Automatic 19 10.4 14.9 17.1 17.3 19.2 24.4 3.8
Manual 13 15.0 21.0 24.4 22.8 30.4 33.9 6.2


Visualize the distribution of MPG values by transmission type


Test the statistical significance of the difference in mean values of MPG for cars with automatic vs manual transmission

Table 2: mtcars dataset: Welch t-test

Dependent Variable

t

df

p

d

95% CI

mpg

-3.77

18.33

.001**

-1.48

[-2.27, -0.67]

Note. * p < .05, ** p < .01, *** p < .001


Interpretation of the exploratory analysis

The summary table and boxplot inform us that higher MPG is associated with manual transmission without consideration of the impact of any potential confounding variables. Table 1 shows that mean MPG is 17.1 for cars with automatic transmission and 24.4 for cars with manual transmission. The results of the t-test indicate that the difference in mean MPG values between automatic and manual transmission is statistically significant (p=0.001). It is important to note, however, that the exploratory analysis did not control for any confounding variables. The next step is the check for correlation between MPG and potential confounding variables to guide determination of a model. The final model will inform us if the difference in MPG remains after adjusting for confounding variables available in the mtcars dataset.

Model Selection

The correlation matrix below (Figure 5) shows that the variables most closely correlated with MPG are weight in 1,000 lbs (wt), displacement in cubic inches (disp), number of cylinders (cyl), gross horsepower (hp), engine type (vs), rear axle ratio (drat), and number of carburetors (carb). In the absence of expert opinion to assist with identification of potential confounding variables, we will run a base model with these independent variables then do a step wise regression in both directions to determine which model produces the lowest AIC.

Regression Analysis

Run a base model with the variables most closely correlated with MPG

base_model <- lm(mpg ~ as.factor(Transmission) + wt + disp + cyl + hp + drat + vs + carb, data = cars)
# Both-direction stepwise regression
# results hidden to save space
both_model <- step(base_model, direction = "both")


The optimal model (lowest AIC) is mpg ~ as.factor(Transmission) + wt + cyl + hp. We will use this model to answer the two research questions.

final_model <- lm(mpg ~ as.factor(Transmission) + cyl + wt + hp, data = cars)

# Gather summary statistics
stats.table <- as.data.frame(summary(final_model)$coefficients)

# Get the confidence interval (CI) of the regression coefficient
CI <- confint(final_model)

# Add a row to join the variables names and CI to the stats
stats.table <- cbind(row.names(stats.table), stats.table, CI)
# Rename the columns appropriately
names(stats.table) <- c("Term", "Estimate", "SE", "t", "p", "95% CI Lower", "95% CI Upper")

#interpret the model output
nice_table(stats.table, title = c("Table 3: Final model summary statistics"), note = c(    "* p < .05, ** p < .01, *** p < .001"
  ))

Table 3: Final model summary statistics

Term

Estimate

SE

t

p

95% CI Lower

95% CI Upper

(Intercept)

33.71

2.60

12.94

< .001***

28.35

39.06

as.factor(Transmission)Manual

1.81

1.40

1.30

.206

-1.06

4.68

cyl6

-3.03

1.41

-2.15

.041*

-5.92

-0.14

cyl8

-2.16

2.28

-0.95

.352

-6.86

2.53

wt

-2.50

0.89

-2.82

.009**

-4.32

-0.68

hp

-0.03

0.01

-2.35

.027*

-0.06

-0.00

Note. * p < .05, ** p < .01, *** p < .001

r_vals <-(as.data.frame(summary(final_model)[8:10]))
nice_table(r_vals, col.format.custom = 2:4,  title=c("Table 4: Final model output r values"))

Table 4: Final model output r values

r.squared

adj.r.squared

fstatistic

0.87

0.84

33.57

0.87

0.84

5.00

0.87

0.84

26.00

ggplot(final_model, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0) +
  labs(title='Figure 6: Residual vs. Fitted Values Plot', x='Fitted Values', y='Residuals')


Model Interpretation

Analyzing the model output informs us that after controlling for confounding variables: vehicle weight, number of cylinders and horsepower, there is not a statistically significant difference between mean MPG when comparing MT to AT cars, p=.206. The residual plot shows that the residual values are randomly distributed indicating that the model is a good fit for the data. The Adjusted R-squared value shows that the model explains approximately 84% of the variance in MPG values.

Conclusion

The output of the multivariable regression model indicates that there is not a statistically significant difference in MPG between cars with manual transmission compared to cars with automatic transmission after controlling for confounding variables. Based on this result, the null hypothesis cannot be rejected.