## Warning: package 'ggplot2' was built under R version 3.2.3
The report will analyze mtcars dataset, explore a set of variables and miles per gallon (MPG) and their relationship. The data is from 1974 Motor Trend magazine, comprising of fuel consumption and 10 other aspects of automobile characteristics.
The questions trying to answer are:
-Is an automatic or manual transmission better for miles per gallon (MPG)?
-How different is the MPG between automatic and manual transmissions?
Using simple linear regression models, and exploratory data analysis to mainly explore how Automatic and Manual transmissions features affect the MPG feature.
Manual transmissions has a higher value of MPG compared to automatic transmission by factor of approximately 1.8 MPG when switching from an automatic transmission to a manual car.
Means and medians for automatic and manual transmission cars are significantly different.
mtcars dataset and transform some variables to factors from numeric## The following object is masked from package:ggplot2:
##
## mpg
See Appendix: Figures section for Plots. Manual transmission generally yields higher values of MPG as seen on the Box Plot; and with pair graph, a strong correlations between variables like “wt”, “disp”, “cyl” and “hp” with mpg is seen.
With inference process, we make the null hypothesis as the mpg of both manual & automatic transmissions are from the same population, using the sample t-test.
t <- t.test(mpg ~ am)
t$p.value
t$estimate
With p-value being 0.001373638, null hypothesis is rejected meaning automatic and manual transmissions are of different populations. The mean for mpg of manual transmitted cars is approx 7 more than automatic transmitted cars mpg.
next fit the initial model.
intialModel <- lm(mpg ~ ., data=mtcars)
summary(intialModel)
The initial model has the Residual Standard Error of 2.833 on 15 (df) degrees of freedom; Adjusted R-squared value is 0.779, meaning the model explains 78% of the variance of the mpg variable. However, none of the coefficients are significant at 0.05 significant level.
Next, we use stepwise model selection in order to select significant predictors for the final, stepModel
stepModel <- step(intialModel, k=log(nrow(mtcars)), direction = "both")
summary(stepModel)
The model: “mpg ~ wt + qsec + am”. Residual standard error is 2.459 on 28(df) degrees of freedom, Adjusted R-squared value is 0.8336, meaning that the model can explain about 83% of the variance of the MPG variable and all coefficients are significant at 0.05 significant level.
The adjusted R-squared value of 0.8336 which is the maximum obtained considering all combinations of variables. From these results we can conclude that more than 83% of the variability is explained by the initial model. The coefficients at at 0.05 significant level.
The scatter plot, [see on Appendix: Figures section] indicates some interaction term between “wt” and “am” variables, with automatic cars tends to weigh heavier than manual cars. hence model below includes the interaction term:
amIntWtModel<-lm(mpg ~ wt + qsec + am + wt:am, data=mtcars)
summary(amIntWtModel)
The model’s residual standard error as 2.084 on 27(df) degrees of freedom, adjusted R-squared value of 0.8804, meaning the model can explain about 88% variance of the MPG variable; the coefficients are significant at 0.05 significant level.
Next, fit a model with MPG being outcome variable and Transmission as the predictor variable.
amModel<-lm(mpg ~ am, data = mtcars)
summary(amModel)
On average it shows a car has 17.147 mpg with automatic transmission, while if manual transmission a 7.245 mpg is increased. The Residual standard error is 4.902 on 30(df) degrees of freedom; Adjusted R-squared value is 0.3385, meaning the model can explain about 34% variance of the MPG variable.
Lastly, we set the final model.
anova(amModel, stepModel, intialModel, amIntWtModel)
confint(amIntWtModel)
Select model with the highest Adjusted R-squared value, “mpg ~ wt + qsec + am + wt:am”.
summary(amIntWtModel)$coef
In conclussion we can say:- * Cars with Manual transmission get more miles per gallon mpg compared to cars with Automatic transmission: * In case number of cylinders cyl increases from 4 to 6 and 8, mpg will decrease by a factor of 3 and 2.2 respectively (adjusted by hp, wt, and am). * mpg decreases negligibly with increase of hp.
Results of how much observations are affected by regression coefficient estimates, the dbBetas shows
sum((abs(dfbetas(amIntWtModel)))>1)
## [1] 0
fig (i). Pair Plot of Motor Trend Auto characteristics
pairs(mtcars, panel=panel.smooth, main=" Pair Plot of Motor Trend Auto characteristics")
fig (ii). Boxplot of MPG vs. Transmission
boxplot(mpg ~ am, xlab="Transmission Type: (0 = Automatic, 1 = Manual)", ylab = "Miles Per Gallon",
main="MPG vs. Transmission")
fig(iii). Scatter Plot of MPG vs. Weight by Transmission
ggplot(mtcars, aes(x=wt, y=mpg, group=am, color=am, height=3, width=3)) + geom_point() +
scale_colour_discrete(labels=c("Automatic", "Manual")) +
xlab("weight") + ggtitle("Scatter Plot of MPG vs. Weight by Transmission")
fig (iv). Residual Plots
par(mfrow = c(2, 2))
plot(amIntWtModel)
The residual plots, shows the following assumptions:
* The Residuals vs. Fitted plot has no consistent pattern, hences supports the accuracy of the independence condition.
* With Normal Q-Q plot, points mostly fall on the line indicating that the residuals are normally distributed..
* Points are scattered in randomly distributed indicating the Scale-Location plot has constant variance.
* With all values fall well within the 0.5, the Residuals vs. Leverage argues no outliers are present.