In this project, we take a look at a data set of a collection of cars, we aim to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). At the completion of the project, we shall answer the following two questions: 1. “Is an automatic or manual transmission better for MPG?” 2. “what value(s) Quantifies the MPG difference between automatic and manual transmissions?”
First, the motor trend dataset was loaded into R, along with the required libraries for analysis.
library(ggplot2)
library(datasets)
library(GGally)
data("mtcars")
Next, using R’s head function, the top rows of the dataset was viewed to see how it appears, then the properties of each column with the str function was viewed next. Also we changed the transmission type into a factor class of automatic(0) or manual(1).
head(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
mtcars$am=as.factor(mtcars$am)
Next, a boxplot was visualized to understand the relationship between transmission type(am) and the miles per gallon(mpg).
gfit1=ggplot(mtcars,aes(x=am,y=mpg,fill=am))
gfit1=gfit1 + geom_boxplot()
gfit1
manual<-mean(mtcars[mtcars$am=="1",]$mpg)
auto<-mean(mtcars[mtcars$am=="0",]$mpg)
newTab<-data.frame(manual=manual,auto=auto)
rownames(newTab)<-"Mean"
newTab
From the boxplot and the mean analysis above, it can be seen that the cars with manual transmission type(1), offer more average miles per gallon than the cars with automatic transmission type(0).
The anova function was then used to determine the best model fit for the regression model. First, I generated a model of the miles per gallon(mpg) with all variables , then I generated a model with specific variables starting with transmission type(am),then I added the number of cylinder(cyl) as a confounding variable to the transmission type,then I added weight(wt), gross horsepower(hp), and displacement(disp) respectively in different model fitting.
fit1=lm(mpg~.,data = mtcars)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am1 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
fit2=lm(mpg~am,data = mtcars)
fit3=lm(mpg~am+cyl,data = mtcars)
fit4=lm(mpg~am+cyl+wt,data = mtcars)
fit5=lm(mpg~am+cyl+hp+wt,data = mtcars)
fit6=lm(mpg~am+cyl+hp+wt+disp,data = mtcars)
anova(fit2,fit3,fit4,fit5,fit6,fit1)
From the variance analysis done, I deduced that model 5 and model 6 do not offer significant difference to the the change in miles per gallon hence we can assume that model 4 is the best model to quantify the effects of different variables on mile per gallon(mpg).
After selection of the best model fit, I use the summary function to know the corresponding effect of each variable on miles per gallon(mpg).
fit5=lm(mpg~am+cyl+hp+wt, data = mtcars)
summary(fit5)
##
## Call:
## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4765 -1.8471 -0.5544 1.2758 5.6608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.14654 3.10478 11.642 4.94e-12 ***
## am1 1.47805 1.44115 1.026 0.3142
## cyl -0.74516 0.58279 -1.279 0.2119
## hp -0.02495 0.01365 -1.828 0.0786 .
## wt -2.60648 0.91984 -2.834 0.0086 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared: 0.849, Adjusted R-squared: 0.8267
## F-statistic: 37.96 on 4 and 27 DF, p-value: 1.025e-10
From the summary of the linear model fit, it can be estimated that cars with automatic transmission offer an average of 36.15 miles per gallon while cars with manual as transmission type offer 1.47 more miles per gallon than that of automatic cars which implies that manual cars offer 37.61 miles per gallon. We can also see from the value R-squared, that the model accounts for 85% of the factors that affect the miles per gallon.
Here we take a visual look at the relationships of the properties of the model and also feature a pair plot.
par(mfrow=c(2,2))
plot(fit5)
ggpairs(mtcars,columns = c(1,2,4,6,9),aes(colour=am))