This analysis is based on the data set of a collection of cars, extracted from the 1974 Motor Trend US magazine, which show the relation between a set of variables and miles per gallon (MPG). It is show how we reach to the answer for the following two questions:
“Is an automatic or manual transmission better for MPG”: The manual transmission is better than automatic.
“Quantify the MPG difference between automatic and manual transmissions” The MPG is increased by 2.9358, by mean, for the change from automatic to manual, keeping constant the weight (wt) and the 1/4 mile time (qsec).
We explore the quantile for automatic cars and for manual. The automatic cars shows lower values. In the annexes you can find the boxplot to see the differences graohicaly.
Automatic
quantile(mtcars[which(mtcars$am==0),]$mpg)
## 0% 25% 50% 75% 100%
## 10.40 14.95 17.30 19.20 24.40
Manual
quantile(mtcars[which(mtcars$am==1),]$mpg)
## 0% 25% 50% 75% 100%
## 15.0 21.0 22.8 30.4 33.9
We find out which model is recommended by the bestgml tool, using as regressors the weight (wt), the 1/4 mile time (qsec) and the transmission (am). See annexes for bestglm calculation.
In the annexes you can find the analysis of the Std.error of the coeficients. The Std.error is much lower in the recommended model using wt+qsec+am, than the Std.error of the model using all the regressors. The anova analysis between the two nested models, wt+qsec+am and the model with all the variables, also show that the model with all the regressors is not recommmended because the p_value is high (see annexes)
This is the model, their coefficients and the confidence interval for the coefficients.
fitbest<-lm(mpg~wt+qsec+factor(am),data=mtcars)
summary(fitbest)
##
## Call:
## lm(formula = mpg ~ wt + qsec + factor(am), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## factor(am)1 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
confint(fitbest)
## 2.5 % 97.5 %
## (Intercept) -4.63829946 23.873860
## wt -5.37333423 -2.459673
## qsec 0.63457320 1.817199
## factor(am)1 0.04573031 5.825944
Looking at the coefficient for am, we can see that the MPG increases in mean a 2.9358 miles gallon when we change from manual transmission to automatic transmission keeping constant the weight (wt) and the 1/4 mile time (qsec). The R-squared of 0.84 indicated that the model explain 84% of the variance. The 95% confidence interval is 0.046MPG and 5.82MPG for the change from automatic to manual. You can find a residual plot in the annexes.
The analysis conclude that the manual transmission is better than automatic and that the MPG is increased by 2.9358, by mean, for the change from manual to automatic, keeping constant the weight (wt) and the 1/4 mile time (qsec).
Boxplot to show the difference in MGP between automatic and manual transmission.
boxplot(mtcars[mtcars$am == 0, ]$mpg, mtcars[mtcars$am ==1, ]$mpg, names = c("Automatic", "Manual"))
Bestglm selection
mtcars_forbestglm <- data.frame(mtcars[,2:11],mtcars[,1])
bestglm(mtcars_forbestglm)
## BIC
## BICq equivalent for q in (0.447079166022759, 0.697804921528401)
## Best Model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## am 2.935837 1.4109045 2.080819 4.671551e-02
Comparing std.error of coefficients between the recommened model by bestglm and the model using all the variables. Anova analysis as nested models.
fitall<-lm(mpg~.,data=mtcars)
fitbest<-lm(mpg~wt+qsec+factor(am),data=mtcars)
summary(fitall)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
summary(fitbest)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## factor(am)1 2.935837 1.4109045 2.080819 4.671551e-02
anova(fitbest,fitall)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt + qsec + factor(am)
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169.29
## 2 21 147.49 7 21.791 0.4432 0.8636
Residual plots
par(mfrow=c(2,2),oma=c(1,1,4,1),mar=c(4,4,2,2))
plot(fitbest,1:4)
title(main="Residuals For Multivariate Regression Model",outer=T,cex.main=1.5)