library('dplyr')
library('ggplot2')
data('mtcars')
time <- format(Sys.time(), "%a %b %d %X %Y")
#generate some summary statistics for the 'mtcars' dataset
summarise(
avg = mean(mpg)
,stdev.mpg = sd(mpg)
,var.mpg = var(mpg)
,min.mpg = min(mpg)
,max.mpg = max(mpg)
,.data = mtcars
) -> dat.summary
#subset out the automatic transmission observations
cars.auto <- mtcars[mtcars$am==0, ]
cars.auto %>% #generate summary statistics by automatic transmission
summarise(
avg = mean(mpg)
,stdev.mpg = sd(mpg)
,var.mpg = var(mpg)
,min.mpg = min(mpg)
,max.mpg = max(mpg)
) -> auto.sum
#subset out the manual transmission observations
cars.man <- mtcars[mtcars$am==1, ]
cars.man %>% #generate summary statistics by manual transmission
summarise(
avg = mean(mpg)
,stdev.mpg = sd(mpg)
,var.mpg = var(mpg)
,min.mpg = min(mpg)
,max.mpg = max(mpg)
) -> man.sum
Exploratory summary statistics of the overall mpg – average mpg = 20 with a standard deviation of 6 mpg.
dat.summary
## avg stdev.mpg var.mpg min.mpg max.mpg
## 1 20.09062 6.026948 36.3241 10.4 33.9
Exploratory summary statistics of the overall mpg by automatic transmission vehicle – average mpg = 17 with a standard deviation of 3.8 mpg.
auto.sum
## avg stdev.mpg var.mpg min.mpg max.mpg
## 1 17.14737 3.833966 14.6993 10.4 24.4
Exploratory summary statistics of the overall mpg by manual transmission vehicles – average mpg = 24 mpg with a standard deviation of 6 mpg.
man.sum
## avg stdev.mpg var.mpg min.mpg max.mpg
## 1 24.39231 6.166504 38.02577 15 33.9
##Explorative inference and regression approach
y <- mtcars$mpg
x <- mtcars$am
fit.1 <- lm(y ~ x)
e <- resid(fit.1)
yhat <- predict(fit.1)
#####A correlation comparison of the mpg to other predictors gives us an idea of what other factors could have an influence on mpg. Accordingly, we observe that mpg is positively correlated with 'drat', 'qsec', 'vs', 'am' (transmission), and 'gear' and negatively correlated with all the remaining.
cor(mtcars)[1,] #Subset out only the mpg correlation predictors
## mpg cyl disp hp drat wt
## 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594
## qsec vs am gear carb
## 0.4186840 0.6640389 0.5998324 0.4802848 -0.5509251
fit <- lm(mpg ~ am, data = mtcars); summary(fit)$coeff;summary(fit)$r.squared
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am 7.244939 1.764422 4.106127 2.850207e-04
## [1] 0.3597989
fit2 <- glm(mpg ~ am + wt + gear, data = mtcars);summary(fit2)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.2114278 5.2848951 7.4195280 4.427762e-08
## am 0.5937500 2.1008563 0.2826229 7.795447e-01
## wt -5.3797696 0.8017495 -6.7100382 2.770328e-07
## gear -0.5570034 1.2619415 -0.4413861 6.623234e-01
We observed that as we added more factors into the model that transmission started having less of an affect on the overall results of mpg. So, we tentatively conclude that there is probably some small affect between the two but we didn’t test enough of the other prediction variables to really settle the dispute. ``` #####Looking solely at the factors of automatic versus manual transmission there appears to be an affect on mpg as the average for manual transmissions is almost 10 mpg better on average (solid back lines).
So, there clearly is a difference, but, is it statistically significant?
#plot of mpg using 'am' as a factor variable
plot(mpg ~ as.factor(am), data = mtcars, xlab = "Transmission (0 = Auto/1=Man)", ylab = "mpg", main = "MPG for Auto vs Manual Transmissions")
#Residual Plot on the horizontal access
res.graph <- plot(mtcars$mpg, e, xlab = "mpg", ylab = "Residual Variance"
, main = 'Residual Variance of mpg by Transmission Factor'
, bg = 'lightblue'
, col = 'black'
, cex = 2
, pch = 21
, frame = FALSE)
abline(h = 0, lwd = 2)
for(i in 1:length(mtcars$mpg))
lines(c(mtcars$mpg[i], mtcars$mpg[i])
, c(e[i], 0), col = 'red', lwd = 2)
This HTML markdown file was generated in knitr at: Mon Feb 01 8:37:29 AM 2016.