According to the summary Wald tests, Alcohol, Cigarettes, and Sex all appear to be significant at the 0.05 level. Alcohol appears to be the most significant variable. Race did not meet the 0.05 pvalue standard and appears to be the least influential variable.
The coefficients for each variable will be interpreted in the context of the problem later.
muse <- cbind(mdata$Marijuanna,1-mdata$Marijuanna)
mmod1 <- glm(muse~Alcohol+Cigarettes+Sex+Race, family = binomial, data = mdata)
summary(mmod1)
##
## Call:
## glm(formula = muse ~ Alcohol + Cigarettes + Sex + Race, family = binomial,
## data = mdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4908 -0.4701 -0.1084 0.8935 3.3070
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.1344 0.4775 -10.752 < 2e-16 ***
## Alcohol 2.9873 0.4655 6.417 1.39e-10 ***
## Cigarettes 2.8592 0.1643 17.401 < 2e-16 ***
## Sex -0.3297 0.1026 -3.212 0.00132 **
## Race -0.2989 0.2015 -1.483 0.13796
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3099.3 on 2275 degrees of freedom
## Residual deviance: 2243.4 on 2271 degrees of freedom
## AIC: 2253.4
##
## Number of Fisher Scoring iterations: 7
exp(coef(mmod1))
## (Intercept) Alcohol Cigarettes Sex Race
## 0.005890848 19.832202285 17.447305099 0.719162324 0.741647834
After exponetiating the coeffecients, we can see their multiplicative effect on the odds of a student using marijuana. The intercept tells us that a student who does not drink alcohol, smoke cigarettes, is white, and male would have a very small odds of using marijuana, at only .00589 odds.
Alcohol: Holding all else constant, if a student drinks alcohol, their odds of smoking marijuana increase by a factor of 19.832.
Cigarettes: Holding all else constant, if a student smokes cigarettes, their odds of smoking marijuana increase by a factor of 17.448.
Sex: (reminder, 1 = female) Holding all else constant, if a student is a female, their odds of smoking marijuana changes by a factor of.719. In other words, holding all else constant, the odds of smoking marijauna for a male is about 1.39 times higher than a female.
Race: (reminder, 1 = not white) Holding all else constant, is a student is not white, their odds of smoking marijuana changes by a factor of .742. In other words, holding all else constant, the odds of smoking marijauna for a white student is about 1.35 times higher than a non white student.
exp(coef(mmod1))
## (Intercept) Alcohol Cigarettes Sex Race
## 0.005890848 19.832202285 17.447305099 0.719162324 0.741647834
exp(-coef(mmod1))
## (Intercept) Alcohol Cigarettes Sex Race
## 169.75485475 0.05042304 0.05731544 1.39050666 1.34834885
Below we will perform drop in deviance tests on each of the variables to ensure that they are actually statistically significant.
mmodA <- glm(muse~Cigarettes+Sex+Race, family = binomial, data = mdata)
mmodC <- glm(muse~Alcohol+Sex+Race, family = binomial, data = mdata)
mmodS <- glm(muse~Alcohol+Cigarettes+Race, family =binomial,data = mdata)
mmodR <- glm(muse~Alcohol+Cigarettes+Sex, family = binomial, data =mdata)
dropdevA <- anova(mmodA, mmod1, test = "Chisq")
dropdevC <- anova(mmodC, mmod1, test = "Chisq")
dropdevS <- anova(mmodS, mmod1, test = "Chisq")
dropdevR <- anova(mmodR, mmod1, test = "Chisq")
ALCOHOL:
The drop in deviance test finds a pvalue of 2.2e-16, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without alcohol.
dropdevA
## Analysis of Deviance Table
##
## Model 1: muse ~ Cigarettes + Sex + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2272 2334.2
## 2 2271 2243.4 1 90.784 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cigarettes:
The drop in deviance test finds a pvalue of 2.2e-16, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without cigarette as a predictor.
dropdevC
## Analysis of Deviance Table
##
## Model 1: muse ~ Alcohol + Sex + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2272 2740.9
## 2 2271 2243.4 1 497.53 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Race:
The drop in deviance test finds a pvalue of .1384, which agrees with our Wald test. The more complex model containing all variables is not a significant improvement over the model without race.
dropdevR
## Analysis of Deviance Table
##
## Model 1: muse ~ Alcohol + Cigarettes + Sex
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2272 2245.6
## 2 2271 2243.4 1 2.1952 0.1384
Sex:
The drop in deviance test finds a pvalue of .001286, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without sex as a predictor. Clearly one of the sexes, male or female, is using more, so putting efforts towards this one sex could help more.
dropdevS
## Analysis of Deviance Table
##
## Model 1: muse ~ Alcohol + Cigarettes + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2272 2253.7
## 2 2271 2243.4 1 10.363 0.001286 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In conclusion, the two most statistically significant variables were Alcohol and Cigarette use, with sex stil significant but slightly less so, and race not statistically significant. Students who used these substances were significiantly more likely to also use Marijuana. The two variables had multiplicative effects of 19.832 and 17.448 respectively on the odds of smoking marijuana. These are tremendous increases, and both are important to tackle. It's likely that when students use these two substances, it is a gateway to using marijuana. Efforts should be made to stop use of these two substances, and that would likely also affect marijuana usage. In looking at Sex and Race demographics, we found that being female and not white had multiplicative changes of 0.719 and 0.742. This tells us that females and non white people were less likely to use Marijuanna. However, Wald and Drop in Deviances tests indicated that Race is not a significant variable. The findings regarding sex were significant, however, and found that males experience higher odds of using Marijuana. Overall, we think that the most effective ways to reduce the odds of Marijuana usage is to decrease usage of cigarettes and alcohol.
Below we create functions for each of the log odds equations given to us to simplify calculations.
#Functions for using the model without lots of typing
#Please enter only 1s and 0s into these functions. See Homework for definitions
logPi <- function(small,Hancock,Oklawaha,Trafford){-1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford}
logPr <- function(small,Hancock,Oklawaha,Trafford){-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford}
logPb <- function(small,Hancock,Oklawaha,Trafford){-2.09-0.63*small+0.70*Hancock-0.65*Oklawaha+1.09*Trafford}
logPo <- function(small,Hancock,Oklawaha,Trafford){-1.90+0.33*small+0.83*Hancock+0.01*Oklawaha+1.52*Trafford}
For each of the odds of another food source over fish, we found those odds to be very small for each. That means that large gators in big lake are more likely to choose fish over each of the other food sources, so fish would be the most popular food for the gators in our sample.
exp(logPi(0,0,0,0))
## [1] 0.212248
exp(logPr(0,0,0,0))
## [1] 0.03651617
exp(logPb(0,0,0,0))
## [1] 0.1236871
exp(logPo(0,0,0,0))
## [1] 0.1495686
Below we find the probabilities for each primary food source for big alligators in lake Hancock. First we found the odds of each food source over fish. Then, using some math knowledge, we were able to find the odds of only fish (aka f/(1-f)). From there we could find the probability of eating fish, and then extract out the probabilities for each of the other food sources. (Trust us, we're math wizards, and we checked that the probabilities added to 1)
i <- exp(logPi(0,1,0,0))
r <- exp(logPr(0,1,0,0))
b <- exp(logPb(0,1,0,0))
o <- exp(logPo(0,1,0,0))
fodds <- (1/(i + r + b + o))
pf <- fodds/(1+fodds)
pi <- i*pf
pr <- r*pf
pb <- b*pf
po <- o*pf
print("probability of eating fish")
## [1] "probability of eating fish"
pf
## [1] 0.5686257
print("probability of eating invertebrates")
## [1] "probability of eating invertebrates"
pi
## [1] 0.02294781
print("probability of eating reptiles")
## [1] "probability of eating reptiles"
pr
## [1] 0.07175247
print("probability of eating birds")
## [1] "probability of eating birds"
pb
## [1] 0.1416306
print("probability of eating other")
## [1] "probability of eating other"
po
## [1] 0.1950434
#pf+pi+pr+pb+po checked that all probabilities add to 1
It appears that regardless of lake, the small alligators have a consistent 4.306 times higher odds of eating an invertebrate diet over fish than large alligators.
print("George")
## [1] "George"
exp(logPi(1,0,0,0))/exp(logPi(0,0,0,0))
## [1] 4.30596
print("Hancock")
## [1] "Hancock"
exp(logPi(1,1,0,0))/exp(logPi(0,1,0,0))
## [1] 4.30596
print("Oklawaha")
## [1] "Oklawaha"
exp(logPi(1,0,1,0))/exp(logPi(0,0,1,0))
## [1] 4.30596
print("Trafford")
## [1] "Trafford"
exp(logPi(1,0,0,1))/exp(logPi(0,0,0,1))
## [1] 4.30596
In order to compute the odds of a gator consuming primarily invertabrates rather than reptiles, we need to compue log(Pi/Pf) - log(Pr/Pf). The steps for how we found this is below.
log(Pi/Pr) = log(Pi) - log(Pr) = log(Pi) - log(Pf) - log(Pr) + log(Pf) = log(Pi/Pf) - log(Pr/Pf)
log(Pi/Pf) - log(Pr/Pf) = -1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford - (-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford)
= 1.76 + 1.81*small - 2.9*Hancock - 1.52*Oklawaha + 2.64*Trafford
Final regression equation:
log(Pi/Pr) = 1.76 + 1.81*I(small) - 2.9*I(Hancock) - 1.52*I(Oklawaha) + 2.64*I(Trafford)
After conducting a Wald test on the variable small, we find a pvalue of .0001311202, which is below our significance level of .05. This means that we reject the null that there is no difference in the log odds of eating invertibrates over fish between small and large alligators.
teststat <- 1.46/.4
pnorm(teststat, lower.tail = FALSE)
## [1] 0.0001311202