Executive summary

In this project I shall investigate and quantify the effect of motor car transmission type (manual and automatic) on fuel economy (miles per gallon), using the mtcars dataset comprising 11 variables for 32 motor car makes and models. I shall also isolate the effect of motor car transmission type on fuel economy from confounding factors such as the number of cylinders in the car’s engine, the displacement (size) of the engine’s cylinders, and the weight of the car.

Motor cars with automatic transmissions and those with manual transmissions have no significant difference in fuel economy. Preliminary differences in fuel economy identified are more likely to be related to factors such as the weight of the car and the power of the engine.

Preliminary exploration

An initial exploration of the mtcars dataset shows that the predictor under investigation, transmission type, is stored as a numeric string. This needs to be converted to a factor string so it can be used as a category for further analysis. Other predictors, such as the number of cylinders, the number of gears, and the number of carburettors, are also discrete units that are more appropriate to store as factors rather than continuous numeric variables.

Statistical testing

The null hypothesis is that the difference between mean fuel economy of motor cars with automatic transmissions and that for motor cars with manual transmissions is not statistically significant.

The alternative hypothesis is that the difference between mean fuel economy of motor cars with automatic transmissions and that for motor cars with manual transmissions is statistically significant.

The sample shall be tested using a two sample t-test to calculate the average fuel economy (mpg) of motor cars with automatic and manual transmissions, and whether the difference between the two transmission types is significant. The sample shall also be tested using a linear regression model of fuel economy (mpg) against transmission type, and a calculation of the adjusted r-squared correlation co-efficient to explain how much of the variation in fuel economy can be explained by transmission type.

The mean fuel economy of motor cars by transmission type is:

The t-value of -3.7671 falls within the 95% confidence interval of -11.280194 to -3.209684, and the p-value of 0.001374 is less than 0.05. This means that the null hypothesis is rejected, at least preliminarily. Therefore, it can at least be superficially said that motor cars with manual transmissions get 7.3 mpg more fuel economy than those with automatic transmissions.

Multivariate analysis

However, the correlation co-efficient between transmission type and fuel economy only explains 33.85% of the variation in fuel economy. This indicates that confounding factors such as the weight and horsepower of the car should be isolated so that the true effect of transmission type can be quantified. Weight and horsepower are the two most likely influences on fuel economy as weight requires energy (fuel) to be moved and horsepower is the rate at which energy (fuel) is consumed by time. Factors such as the number of cylinders, engine displacement, and number of carburetors are all factors in the car’s horsepower, i.e. the amount of energy (fuel) it can expend per unit of time, and acceleration (qsec) is a function of the car’s weight and horsepower.

The adjusted r-squared correlation coefficient in this multivariate analysis now explains 82.27% of the variation in fuel economy.

When the car’s weight and horsepower are isolated, the mean fuel economy of motor cars by transmission type is:

  • Automatic - 34.0 mpg
  • Manual - 36.1 mpg

This difference of 2.1 mpg is less than the 7.3 mpg difference calculated before weight and horsepower were isolated. As the p-value for manual transmissions in relation to automatic transmissions in this scenario is greater than 0.05, the difference in mileage is not considered to be statistically significant and the null hypothesis is not rejected.

Conclusion

Motor cars with automatic transmissions and those with manual transmissions have no significant difference in fuel economy. Preliminary differences in fuel economy identified are more likely to be related to factors such as the weight of the car and the power of the engine.

Appendices

1. Preliminary exploration code and results

# Load required data
        data(mtcars)
# Convert variables from numeric to factor
        mtcars$am <- factor(mtcars$am, labels = c("Automatic","Manual"))
        mtcars$cyl <- as.factor(mtcars$cyl)
        mtcars$vs <- factor(mtcars$vs, labels = c("V-shaped","Straight"))
        mtcars$gear <- as.factor(mtcars$gear)
        mtcars$carb <- as.factor(mtcars$carb)

2. Statistical testing code and results

# t-test
        t.test(mpg ~ am, data = mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231
# Linear regression model
        model1 <- lm(mpg ~ am, data = mtcars)
        summary(model1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
# Multivariate linear regression model
        model2 <- lm(mpg ~ am + wt + hp, data = mtcars)
        summary(model2)
## 
## Call:
## lm(formula = mpg ~ am + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## amManual     2.083710   1.376420   1.514 0.141268    
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

3. Individual linear models

# Study individual linear models
        mtcars_pairs <- mtcars[, c(1, 9, 6, 4)]
        par(mar = c(1, 1, 1, 1))
        pairs(mtcars_pairs, panel = panel.smooth, col = 9)

4. Residual plots

# Study residual plots
        par(mar = c(2, 2, 2, 2))        
        par(mfrow = c(2,2))
        plot(model2)