This work examine the mtcars dataset. In this case we examine the relationship between miles per galon (MPG) and other variables. Whit this relationships me explain two questions, (I) the efect of automatic or manual transmission over de MPG, (II) measure MPG performance between automatic and manual transmissions. We conclude manual transmission is better than the automatic transmission, with 40% higher yield.
View the first 10 cars.
head(mtcars, 10)
Plot data to visualize the distribuiton values.
plot(mtcars)
Summaryse all variables.
library(knitr)
data(mtcars);kable(summary(mtcars[1:5]));kable(summary(mtcars[6:11]))
mpg | cyl | disp | hp | drat | |
---|---|---|---|---|---|
Min. :10.40 | Min. :4.000 | Min. : 71.1 | Min. : 52.0 | Min. :2.760 | |
1st Qu.:15.43 | 1st Qu.:4.000 | 1st Qu.:120.8 | 1st Qu.: 96.5 | 1st Qu.:3.080 | |
Median :19.20 | Median :6.000 | Median :196.3 | Median :123.0 | Median :3.695 | |
Mean :20.09 | Mean :6.188 | Mean :230.7 | Mean :146.7 | Mean :3.597 | |
3rd Qu.:22.80 | 3rd Qu.:8.000 | 3rd Qu.:326.0 | 3rd Qu.:180.0 | 3rd Qu.:3.920 | |
Max. :33.90 | Max. :8.000 | Max. :472.0 | Max. :335.0 | Max. :4.930 |
wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|
Min. :1.513 | Min. :14.50 | Min. :0.0000 | Min. :0.0000 | Min. :3.000 | Min. :1.000 | |
1st Qu.:2.581 | 1st Qu.:16.89 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.:3.000 | 1st Qu.:2.000 | |
Median :3.325 | Median :17.71 | Median :0.0000 | Median :0.0000 | Median :4.000 | Median :2.000 | |
Mean :3.217 | Mean :17.85 | Mean :0.4375 | Mean :0.4062 | Mean :3.688 | Mean :2.812 | |
3rd Qu.:3.610 | 3rd Qu.:18.90 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.:4.000 | 3rd Qu.:4.000 | |
Max. :5.424 | Max. :22.90 | Max. :1.0000 | Max. :1.0000 | Max. :5.000 | Max. :8.000 |
Visualize the data types:
class(c(mtcars$mpg, mtcars$cyl, mtcars$cyl, mtcars$drat, mtcars$gear, mtcars$wt, mtcars$wt, mtcars$qsec, mtcars$am))
## [1] "numeric"
How we have variables witch continuous values and the number of cases are poor (32), we convert as factor to optime the analisis:
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c("Automatic","Manual"))
Aplly linear regression to identify the efect of preditive variables over the result of MPG
fit1 = lm(mpg ~am, data = mtcars)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The results indicate that the MPG in automatic vehicles was 17.1 MPG, whereas in the manuals it was 7.2 MPG higher than the first one. The adjusted R indicated 0.36, which indicates a small explanatory power of 36% of the behavior in the outcome variable. The solution is to use a test of multiple predictors. The use of multiple variables aims to specialize the model and gain explanatory power.
fit2 <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9374 -1.3347 -0.3903 1.1910 5.0757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.864276 2.695416 12.564 2.67e-12 ***
## amManual 1.806099 1.421079 1.271 0.2155
## cyl6 -3.136067 1.469090 -2.135 0.0428 *
## cyl8 -2.717781 2.898149 -0.938 0.3573
## disp 0.004088 0.012767 0.320 0.7515
## hp -0.032480 0.013983 -2.323 0.0286 *
## wt -2.738695 1.175978 -2.329 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.453 on 25 degrees of freedom
## Multiple R-squared: 0.8664, Adjusted R-squared: 0.8344
## F-statistic: 27.03 on 6 and 25 DF, p-value: 8.861e-10
In this case the R adjusted is better which the before model - this model explain 82% of comportament of outcome variable. To confirm this we apply the anova test.
options(scipen = 999)
anova(fit1, fit2)
The p-value of 0.00000008636804 or 8,637e-08 that the second test has a significant explanatory power gain.
boxplot(mpg ~ am, data=mtcars, col=(c("yellow","pink")), xlab="Transmission Type (0 = Automatic, 1 = Manual)", ylab="MPG", main="Distribution MPG vs. Transmission")
At this point we conclude that, manual transmission is better wich automatic just because your gain is over the automatic transmission.
We retrieve the mean values of two types of transmission to observe the difference between theirs results.
ag = aggregate(mtcars$mpg,by=list(mtcars$am),FUN=mean)
ag
ag$x[2]/ag$x[1]
## [1] 1.42251
We now conclude that the manual transmission is (on average) 42% more efficient than the automatic transmission.
boxplot(mpg ~ cyl, data=mtcars, col=(c("yellow","pink", "red")), xlab="Cylinders", ylab="MPG", main="Distribution Cylinders vs. MPG")
boxplot(mpg ~ disp, data=mtcars, col= "green", xlab="Displacement", ylab="MPG", main="Distribution Displacement vs. MPG")
boxplot(mpg ~ hp, data=mtcars, xlab="HP", ylab="MPG", main="Distribution HP vs. MPG")
boxplot(mpg ~ wt, data=mtcars, xlab="WT", col="red", ylab="MPG", main="Distribution WT vs. MPG")
mtcars_used <- matrix(c(mtcars$mpg, mtcars$am, mtcars$cyl, mtcars$disp, mtcars$hp, mtcars$wt),ncol = 6, nrow = 32)
mtcars_used = as.data.frame(mtcars_used)
names(mtcars_used) = c("MPG", "AM", "CYL", "DISP", "HP", "WT")
head(mtcars_used)
pairs(mtcars_used, panel = panel.smooth, col = 9)
cor(mtcars_used)
## MPG AM CYL DISP HP WT
## MPG 1.0000000 0.5998324 -0.8521620 -0.8475514 -0.7761684 -0.8676594
## AM 0.5998324 1.0000000 -0.5226070 -0.5912270 -0.2432043 -0.6924953
## CYL -0.8521620 -0.5226070 1.0000000 0.9020329 0.8324475 0.7824958
## DISP -0.8475514 -0.5912270 0.9020329 1.0000000 0.7909486 0.8879799
## HP -0.7761684 -0.2432043 0.8324475 0.7909486 1.0000000 0.6587479
## WT -0.8676594 -0.6924953 0.7824958 0.8879799 0.6587479 1.0000000
par(mfrow = c(2,2))
plot(fit1)
par(mfrow = c(2,2))
plot(fit2)
…