knitr::opts_chunk$set(echo = TRUE)
The Motor Trend magazine is interested in exploring the relationship between a set of variables and miles per gallon consumption. The data set that will be used to address two particular questions, namely; 1. Is an automatic or manual transmission better for MPG? 2. Quantify the MPG difference between automatic and manual transmissions. is the mtcars data sourced from the 1974 Motor Trend US magazine. From the data processing and analysis performed, this report shows that manual transmission is better for MPG compared to automatic transmiision by a factor of 1.55
The mtcars data set is loaded into R and the variables transformed for analysis. The transmission (am) variable is converted into factor.
library(datasets); data(mtcars);head(mtcars)
str(mtcars)
## convert am variable to a factor
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
A box plot of MPG by Transmission is plotted to better visualise the relationship between the two variables. (see appendix)
require(ggplot2)
p <- ggplot(mtcars, aes(factor(am), mpg, fill=factor(am)))
p + geom_boxplot() + ggtitle("Plot of MPG by Transmission Type") + xlab("Transmission") + ylab("Miles per Gallon")
The plot shows that Manual transmission is better for MPG with a mean of 23, while the automatic transmission has a mean of 17.
A dummy variable is fitted with mpg as outcome and transmission as the predictor to see their relationship independent of other variables.
fit <- lm(mpg ~ factor(mtcars$am), mtcars)
summary(fit)
The adjusted R-squared value is 33.85% which means the above model only accounts for 34% of the variabilty. Conclusion is not the best model and we have to iclude other variables that will give us a better model.
The model below is fitted including all the variables to see how they relate with mpg.
fitall <- lm(mpg ~ ., mtcars)
summary(fitall)$coef
summary(fitall)$r.squared
The model has an r-squared value of 86.9%, which is better compared to the initial model. However, inorder to get the best model, i will choose the following variables to get the best regression line. My decision is informed from the fitall variable coefficients. These are, am,cyl,hp and wt.
We will calculate the VIFs of all the variables to determine which variables to include or exclude in our model.
library(car)
vif(fitall); sqrt(vif(fitall))
We can see that cyl, disp, hp, and wt variables have a significant(high) variance inflation factor and thus will be included in our best model.
Fitting a regression line using these variables gives us the following outcome. Note that the am is the independent variable here.
fit1 <- lm(mpg ~ am+cyl+disp+hp+wt, mtcars)
summary(fit1)
The output shows that there is a 1.55 increase in mpg for every unit change in the manual transmission. Also the adjusted r-squared accounts for 85% of the variabilities.
Manual transmission has a better miles per gallon consumption by a factor of 1.55 compared to automatic transmission motovehicles.