We are using the R’s “mtcars” dataset to perform our analysis. For information, the data for this dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The dataset comprises 32 observations on 11 variables. For the purpose of this analysis, we shall use our analysis to derive answers for the following two questions:
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Explore dataset
data(mtcars)
head(mtcars, 3)#load the first 3 rows; we are interested in "mpg" and "am" columns
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
str(mtcars) #note that all data are numeric by default
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
we note that “am” column captures Transmission data (which denotes 0= automatic, and 1= manual). However, this variable is recorded as “numeric” in the dataset. We need to convert this to a factor variable to facilitate our analysis.
#create a copy of the dataset (in case we need to load it)
mtcars_copy <- mtcars
# change "am" from numeric to factor variable
mtcars$am <- factor(mtcars$am, labels=c("Automatic", "Manual"))
head(mtcars, 4)#load the first 4 rows to confirm that label changes have been made
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3 1
We shall first perform exploratory data analysis and plot the data using boxplot.
ggplot(data=mtcars, aes(am, mpg))+geom_boxplot() +labs(x="Transmission", y="Miles/(US) gallon", title="Plot of Miles Per Gallon vs Car Transmission (Automatic / Manual)")
We can see that cars using “Manual”" Transmission is more fuel-efficient than those using “Automatic” Transmission as evident from the higher Median value. In addition, the minimum value is also higher than the maximum value of “Automatic” cars, and this further supports the fuel-advantage of manual cars. Based on this, we conclude that manual transmission vehicle is better in terms of MPG.
We shall now use 2 methods to quantify the mpg difference between automatic and manual tarnsmissions: (i) Firstly, we shall do a simple t-test to ascertain if the mean mpg of manual cars are significantly difference from that of automatic cars. (ii) Second method involves comparing (using ANOVA) a multiple regression model over a simple regression model to ascertain if there’s a significant difference in means of mpg in manual cars and automatic cars holding other variables constant.
We can further do a t-test to further evaluate this. Specifically, we can split the data into “Automatic” and “Manual”, and see if there’s a difference in the miles-per-gallon using a t-test.
auto_cars <- mtcars[mtcars$am=="Automatic",]
manual_cars <- mtcars[mtcars$am=="Manual",]
t.test(auto_cars$mpg, manual_cars$mpg)
##
## Welch Two Sample t-test
##
## data: auto_cars$mpg and manual_cars$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
Given that p-value is small (0.001374), we support the alternative hypothesis that the true difference in means between manual and automatic cars is not equal to 0. Given that the mean Miles per Gallon (mpg) of manual cars is higher (24.39231) than automatic cars (17.14737), we therefore conclude that manual transmission better for MPG.
simple_model <- lm(mpg~am-1, data=mtcars)#we remove the intercept so that the coefficients can be directly compared with one another.
summary(simple_model)
##
## Call:
## lm(formula = mpg ~ am - 1, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## amAutomatic 17.147 1.125 15.25 1.13e-15 ***
## amManual 24.392 1.360 17.94 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9452
## F-statistic: 277.2 on 2 and 30 DF, p-value: < 2.2e-16
From the simple linear regression model, we note that the mpg for manual cars are higher (24.392) than automatic cars (17.147).
we shall use regress mpg against all available variables in the dataset, and then utilise a stepwise method to test and remove variables that are not significant.
multiple_regression <- step(lm(mpg~., data=mtcars))
Based on the results of the stepwise regression, the model with lowest AIC is selected, which is: mpg~wt + qsec + am. This means that the key variables that affect Miles per Gallon (mpg) of cars are: weight (wt), acceleration (qsec), and transmission type (am). We shall use this as our complex model.
complex_model <- lm(mpg~wt + qsec + am-1, data=mtcars)
complex_model
##
## Call:
## lm(formula = mpg ~ wt + qsec + am - 1, data = mtcars)
##
## Coefficients:
## wt qsec amAutomatic amManual
## -3.917 1.226 9.618 12.554
anova(simple_model, complex_model)
## Analysis of Variance Table
##
## Model 1: mpg ~ am - 1
## Model 2: mpg ~ wt + qsec + am - 1
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The high p-value of the complex model means that we can reject the null hypothesis in favour of the alternate hypothesis that besides transmission, weight and acceleration of cars also yield significant impact on mpg. But holding weight and acceleration constant manual cars offer better fuel efficiency (about 2.94 mpg difference) than automatic cars.
Let’s take a look at the wt and qsec plots and see if there’s a difference between automatic and manual cars.
# Change box plot colors by groups
require(gridExtra)# gridExtra package need to be installed
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(ggplot2)
plot1 <- ggplot(data=mtcars, aes(x=am, y=wt, fill=am)) +
geom_boxplot()
plot2 <- ggplot(data=mtcars, aes(x=am, y=qsec, fill=am)) +
geom_boxplot()
grid.arrange(plot1, plot2, ncol=2)
From the above plots, we note that automatic cars are heavier (median weight above 3,500 lbs) as compared to manual cars (median weight below 2,500 lbs). It is evident that the heavier weight of automatic cars is significant, and that probably leads to lower Miles per gallon (mpg).
We note that there are overlapping acceleration between maunal and automatic cars. However, on balance, the acceleration for automatic cars is moderately faster (median qsec is slightly below 18 quarter mile per sec) versus that of manual cars (median qsec is about 17 quarter mile per sec). To provide higher acceleration, it is possible that more/particular mechanisms in automatic vehicles are probably needed, which in turn may reduce its fuel efficiency. However, deeper analysis would be required to validate this hypothesis.
We shall use a Studentized Residual Plot in the olsrr package to detect potential outliers in our complex model. This method considers a point as an outlier if it has an aboslute value higher than 3.
library(olsrr)
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
ols_plot_resid_stud(complex_model)
we note that there are 2 leverage points (17, 18) in the data, but this is still within 3 standard deviation. This means that no further adjustments are required for our complex model, and we can take our analysis from this model as statistically valid.
In conclusion, manual cars appear to have better fuel efficiency (about 2.94 mpg higher) than automatic cars.