In this report I aim to answer the questions
“Is an automatic or manual transmission better for MPG”
“Quantify the MPG difference between automatic and manual transmissions”
My findings are that the differences between manual/automatic transmission are better explained through other variables.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Description sourced from the R documentation
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
| Var | Desc |
|---|---|
| mpg | Miles/(US) gallon |
| cyl | Number of cylinders |
| disp | Displacement (cu.in.) |
| hp | Gross horsepower |
| drat | Rear axle ratio |
| wt | Weight (1000 lbs) |
| qsec | 1/4 mile time |
| vs | Engine (0 = V-shaped, 1 = straight) |
| am | Transmission (0 = automatic, 1 = manual) |
| gear | Number of forward gears |
| carb | Number of carburetors |
vs and am are categorical variables so I will convert the columns from numeric to factor. I will also do the same for gear, cyl, and carb. Though it could be argued that there are numeric variables, in this case I will treat them as categorical.
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$carb <- as.factor(mtcars$carb)
mtcars$vs <- factor(mtcars$vs, labels=c("V-shaped", "Straight"))
mtcars$am <- factor(mtcars$am, labels=c("Automatic", "Manual"))
I will also rename the columns to more readable names
names(mtcars) <- c("mpg", "cylinders", "displacement", "horsepower", "rear_axle_ratio", "weight", "quarter_mile_time", "engine", "transmission", "gears", "carburetors")
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cylinders : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ displacement : num 160 160 108 258 360 ...
## $ horsepower : num 110 110 93 110 175 105 245 62 95 123 ...
## $ rear_axle_ratio : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ weight : num 2.62 2.88 2.32 3.21 3.44 ...
## $ quarter_mile_time: num 16.5 17 18.6 19.4 17 ...
## $ engine : Factor w/ 2 levels "V-shaped","Straight": 1 1 2 2 1 2 1 2 2 2 ...
## $ transmission : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ gears : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ carburetors : Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)
## mpg cylinders displacement horsepower rear_axle_ratio
## Min. :10.40 4:11 Min. : 71.1 Min. : 52.0 Min. :2.760
## 1st Qu.:15.43 6: 7 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
## Median :19.20 8:14 Median :196.3 Median :123.0 Median :3.695
## Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.597
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :33.90 Max. :472.0 Max. :335.0 Max. :4.930
## weight quarter_mile_time engine transmission gears
## Min. :1.513 Min. :14.50 V-shaped:18 Automatic:19 3:15
## 1st Qu.:2.581 1st Qu.:16.89 Straight:14 Manual :13 4:12
## Median :3.325 Median :17.71 5: 5
## Mean :3.217 Mean :17.85
## 3rd Qu.:3.610 3rd Qu.:18.90
## Max. :5.424 Max. :22.90
## carburetors
## 1: 7
## 2:10
## 3: 3
## 4:10
## 6: 1
## 8: 1
No NA values are present, so no imputation needed.
sapply(mtcars, function(x) sum(is.na(x)))
## mpg cylinders displacement horsepower
## 0 0 0 0
## rear_axle_ratio weight quarter_mile_time engine
## 0 0 0 0
## transmission gears carburetors
## 0 0 0
ggplot(mtcars, aes(x = mpg)) +
geom_density() +
labs(title = "Distribution of MPG", x = "MPG", y = "Density")
ggplot(mtcars, aes(x = mpg, fill = transmission)) +
geom_density(alpha = 0.5) +
labs(title = "Distribution of MPG by Transmission", x = "MPG", y = "Density")
It appears that automatic cars in general have a small range of MPG values, centred at a lower MPG than that of manual transmission cars, though manual transmission also have a larger range of values. We can verify this using a T-test, using a confidence level of 95% and a null hypothesis that the mean MPG of automatic and manual transmission cars are the same.
t.test(mpg ~ transmission, data = mtcars, conf.level = 0.95, alternative = "less")
##
## Welch Two Sample t-test
##
## data: mpg by transmission
## t = -3.7671, df = 18.332, p-value = 0.0006868
## alternative hypothesis: true difference in means between group Automatic and group Manual is less than 0
## 95 percent confidence interval:
## -Inf -3.913256
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
We have a p-value of 0.0006 thus we can reject the null hypothesis that the mean MPG of automatic and manual transmission cars are the same, and accept the alternative hypothesis that the mean MPG of automatic transmission cars is less than that of manual transmission cars.
However this may be explaining relationships between other variables and mpg, assuming they are linked to transmission.
model <- lm(mpg ~ transmission, data = mtcars)
summary(model)
##
## Call:
## lm(formula = mpg ~ transmission, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## transmissionManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
By fitting a linear model from just the transmission to mpg it seems that moving from a transmission of 0 (automatic) to 1 (manual) increases the mpg by 7.245. However this is not a very good model as the R-squared value is only 0.3598, meaning that only 35.98% of the variance in mpg is explained by the transmission.
ggplot(model, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted Values") +
ylab("Residuals")
model <- lm(mpg ~ ., data=mtcars)
summary(model)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5087 -1.3584 -0.0948 0.7745 4.6251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.87913 20.06582 1.190 0.2525
## cylinders6 -2.64870 3.04089 -0.871 0.3975
## cylinders8 -0.33616 7.15954 -0.047 0.9632
## displacement 0.03555 0.03190 1.114 0.2827
## horsepower -0.07051 0.03943 -1.788 0.0939 .
## rear_axle_ratio 1.18283 2.48348 0.476 0.6407
## weight -4.52978 2.53875 -1.784 0.0946 .
## quarter_mile_time 0.36784 0.93540 0.393 0.6997
## engineStraight 1.93085 2.87126 0.672 0.5115
## transmissionManual 1.21212 3.21355 0.377 0.7113
## gears4 1.11435 3.79952 0.293 0.7733
## gears5 2.52840 3.73636 0.677 0.5089
## carburetors2 -0.97935 2.31797 -0.423 0.6787
## carburetors3 2.99964 4.29355 0.699 0.4955
## carburetors4 1.09142 4.44962 0.245 0.8096
## carburetors6 4.47757 6.38406 0.701 0.4938
## carburetors8 7.25041 8.36057 0.867 0.3995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.833 on 15 degrees of freedom
## Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
## F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
When including all the variables we can see that the changing the transmission only increases the mpg by 1.212, suggesting that most of the change in mpg is due to other factors than just manual/automatic. The high p value of 0.7 suggests that we cannot attribute the change in mpg to transmission.
In particular horsepower and weight seem to contribute the most to capturing the variance of mpg. This model has an R-squared of 89%, and it is clear from the residual plot that it is a much better predictor of mpg.
ggplot(model, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted Values") +
ylab("Residuals")
In conclusion: manual transmission is better for MPG than automatic, switching to manual will increase MPG by approximately 1.212, but I have shown that this transmission is not a statistically signifcant contributor to change in mpg when taking other variables into account.