We have been tasked in analysing data for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (mpg) (outcome). They are particularly interested in the following two questions:
“Is an automatic or manual transmission better for mpg”
“Quantify the mpg difference between automatic and manual transmissions”
Our aim is to answer those 2 questions via exploratory and inferential analysis, and close with trying to ascertain a regression model that can prove (wrap up) our claim.
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
To begin, we read in the data and have a quick look at its structure. At the moment we are interested in looking at the relationship with (automatic or manual) transmission (am) and miles per gallon (mpg). We quickly compare the mpg means to see if there is a difference between the 2 transmissions. N/B: am (0=automatic,1=manual)
t.m<-tapply(mtcars$mpg,mtcars$am,mean)
names(t.m)<-c("Automatic","Manual")
d.tm<-abs(t.m[1]-t.m[2])
t.m
## Automatic Manual
## 17.14737 24.39231
We can verify that there is a difference of 7.2449393 between the mpg means of the 2 transmission types with Manual transmission having a higher mpg.We can test to see if there is truly a statistical significant difference between the 2 transmissions in relation to mpg.
t.t<-with(mtcars,t.test(mpg~am))
p.tt<-round(t.t$p.value,8)
Using the t test, we have a p value of 0.0013736, which indicates we reject the null hypothesis. Therefore assuring us there is a difference between the mpg of manual and automatic transmission as we suspected earlier.
In order to “Quantify” the mpg difference we will need to find a model that accurately predicts mpg using a combination of the variables we currently have. We will use 3 models - Model 1 compares mpg and am only,Model 2 compares mpg and the rest of the variables, and Model 3 uses the stepwise function which gives the best model based by adjusted R square.
m1<-lm(data=mtcars,mpg~am)
m2<-lm(data=mtcars,mpg~.)
m3<-step(lm(data=mtcars,mpg~.),trace=0)
rm1<-round(summary(m1)$adj.r.squared,8)
rm2<-round(summary(m2)$adj.r.squared,8)
rm3<-round(summary(m3)$adj.r.squared,8)
Running regression analysis on the 3 models, we have the following adjusted r squared values Model 1 (0.3384589),Model 2 (0.8066423 ), and Model 3 (0.8335561).
Model 3 (m3) has the highest adjusted r square value
summary(m3)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
This indicates that the model that best predicts mpg consists of wt,qsec, and am as predictor variables. wt has a negative relationship, while qsec and am have a positive relationship. In relation to our objective (mpg and am), manual transmission would seem to have 2.94 more mpg than automatic transmission, meaning Automatic Transmission uses less mpg than manual tranmsission. To conclude, Motor Trend could consider looking at not only at am, but wt and qsec if they want to impact mpg optimally.
We can look at the residual plots for peace of mind to further support our model.
require(ggplot2)
## Loading required package: ggplot2
p<-ggplot(mtcars,aes(factor(am,labels=c("Automatic","Manual")),mpg))
p+geom_boxplot()+labs(title="Boxplot of Motor Trends Transmission" ,x="Transmission Type")
plot(m3)