We examine mtcars data set to answer the following questions:
From the data exploratory, a T-test confirms that manual transmission is better than automatic transmission on MPG. A regression model (with selection of predictors and omiting outliers) is built to explain ~ \(93\)% of the variance in MPG of the resulting data set. This model shows that manual transmission is better by an improvement of ~ \(2.19\) in MPG. Note: all figures are in Appendix of this present report.
# Libraries
library(leaps); library(ggplot2); library(car)
# Load data set
data(mtcars); mydata<- mtcars
# Assignn factor class
vars<- c('cyl','vs','gear','carb','am'); for(elt in vars){
ifelse(elt != 'am', mydata[,elt]<- factor(mydata[,elt]),
mydata[,elt]<- factor(mydata[,elt],labels = c('automatic','manual')))}
The figure 1 shows characteristics between variables (e.g. correlation and frequency via histograms). We can see from this figure that manual transmission seems to imply higher MPG than the automatic transmission (as also shown in figure 2). Let’s make in below a T-test to confirm this observation.
Here, the null hypothesis considers that there is no difference in mean of mpg between types (manual and automatic) of transmission:
# T-Test on mpg from each transmission
T_test <- with(mydata,t.test(mpg~am))
# P-value and the 95% confidence interval
T_test$p.value; T_test$conf.int
## [1] 0.001373638
## [1] -11.280194 -3.209684
## attr(,"conf.level")
## [1] 0.95
The above T-test shows that the null hypothesis is rejected (p value < 0.05 and 95% confidence interval does not contain zero). This means that manual and automatic transmissions have not the same behavior on MPG, i.e. manual transmission leads to a better MPG as expected the above observation. Let’s make in below a regression model to quantify this difference between types of transmission.
The subset regression technique is used (because all possible combinations of predictors are inspected) to select predictors and built a regression model with the highest adjusted R-squared (because R-squared always increases with addition of predictors). This technique is illustrated via the figure 2 (obtained with the regsubsets() function) which shows that regression model with the intercept and the variable wt leads to the lowest adjusted R-squared value (~ \(0.74\)). However, the subset regression technique suggests to consider the following predictors cyl, hp, wt, vs and am to get a high adjusted R-squared value.
With the above selected predictors, the regression model Fit is as follows:
# Fit model
Fit <- lm(mpg~ cyl+hp+wt+vs+am, mydata)
The influence plot (figure 4, obtained with the influencePlot() function) shows that the ‘Chrysler Imperial’ is the most influencial observation (largest circle size), while “Maserati Bora” and “Porshe 914” have both high leverages. Let’s omit cars above +2 or below â“2 on the horizontal axis (considered as outliers) to make a second regression model Fit_2 as follows:
# Omit outliers
vars <- c('Fiat 128','Toyota Corolla','Chrysler Imperial','Volvo 142E','Datsun 710')
# Fit 2 model
Fit_2 <- lm(mpg~ cyl+hp+wt+vs+am, mydata[!(rownames(mydata) %in% vars),])
# Coefficients estimate and R-squared
round(summary(Fit_2)$coefficients[,1],2); round(summary(Fit_2)$r.squared,2)
## (Intercept) cyl6 cyl8 hp wt vs1
## 31.65 -2.25 -0.85 -0.03 -2.66 1.79
## ammanual
## 2.19
## [1] 0.93
From the above, this second model explains ~ \(93\)% of the variance in mpg variable, which can be considered as a good fit model. Similarly, we can see by holding other variables fixed that the manual transmission increases MPG of ~ \(2.19\) (ammanual coefficient) compared to the automatic transmission. This confirms that the manual transmission is better as expected in the exploratory data analysis section.
From the figure 5, the points in the Residuals vs. Fitted plot are randomly scattered which verify the linearity condition. Similarly, the Normal Q-Q plot shows that residuals are normally distributed, while the Scale-Location plot confirms constant variance (homoscedasticity). Finally, the Residuals vs. Leverage plot shows that the great majority and leveraged scatttered points are inliers.
The influence of types of transmission (manual and automatic) on the dependent variable mpg of the mtcars data set is analysed via a statistical study and a regression model. It was found that the manual transmission is better on MPG and the improvement is evaluated of an increase of ~ \(2.19\).
character(0)