Quiz 3

A: Marijuana

Model with All Predictors

According to the summary Wald tests, Alcohol, Cigarettes, and Sex all appear to be significant at the 0.05 level. Alcohol appears to be the most significant variable. Race did not meet the 0.05 pvalue standard and appears to be the least influential variable. 

The coefficients for each variable will be interpreted in the context of the problem later.

muse <- cbind(mdata$Marijuanna,1-mdata$Marijuanna)
mmod1 <- glm(muse~Alcohol+Cigarettes+Sex+Race, family = binomial, data = mdata)
summary(mmod1)

## 
## Call:
## glm(formula = muse ~ Alcohol + Cigarettes + Sex + Race, family = binomial, 
##     data = mdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4908  -0.4701  -0.1084   0.8935   3.3070  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -5.1344     0.4775 -10.752  < 2e-16 ***
## Alcohol       2.9873     0.4655   6.417 1.39e-10 ***
## Cigarettes    2.8592     0.1643  17.401  < 2e-16 ***
## Sex          -0.3297     0.1026  -3.212  0.00132 ** 
## Race         -0.2989     0.2015  -1.483  0.13796    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3099.3  on 2275  degrees of freedom
## Residual deviance: 2243.4  on 2271  degrees of freedom
## AIC: 2253.4
## 
## Number of Fisher Scoring iterations: 7

exp(coef(mmod1))

##  (Intercept)      Alcohol   Cigarettes          Sex         Race 
##  0.005890848 19.832202285 17.447305099  0.719162324  0.741647834

Model Interpretation

After exponetiating the coeffecients, we can see their multiplicative effect on the odds of a student using marijuana. The intercept tells us that a student who does not drink alcohol, smoke cigarettes, is white, and male would have a very small odds of using marijuana, at only .00589 odds.

Alcohol: Holding all else constant, if a student drinks alcohol, their odds of smoking marijuana increase by a factor of 19.832.

Cigarettes: Holding all else constant, if a student smokes cigarettes, their odds of smoking marijuana increase by a factor of 17.448.

Sex: (reminder, 1 = female) Holding all else constant, if a student is a female, their odds of smoking marijuana changes by a factor of.719. In other words, holding all else constant, the odds of smoking marijauna for a male is about 1.39 times higher than a female.

Race: (reminder, 1 = not white) Holding all else constant, is a student is not white, their odds of smoking marijuana changes by a factor of .742. In other words, holding all else constant, the odds of smoking marijauna for a white student is about 1.35 times higher than a non white student.

exp(coef(mmod1))

##  (Intercept)      Alcohol   Cigarettes          Sex         Race 
##  0.005890848 19.832202285 17.447305099  0.719162324  0.741647834

exp(-coef(mmod1))

##  (Intercept)      Alcohol   Cigarettes          Sex         Race 
## 169.75485475   0.05042304   0.05731544   1.39050666   1.34834885

Below we will perform drop in deviance tests on each of the variables to ensure that they are actually statistically significant.

mmodA <- glm(muse~Cigarettes+Sex+Race, family = binomial, data = mdata)
mmodC <- glm(muse~Alcohol+Sex+Race, family = binomial, data = mdata)
mmodS <- glm(muse~Alcohol+Cigarettes+Race, family =binomial,data = mdata)
mmodR <- glm(muse~Alcohol+Cigarettes+Sex, family = binomial, data =mdata)
dropdevA <- anova(mmodA, mmod1, test = "Chisq")
dropdevC <- anova(mmodC, mmod1, test = "Chisq")
dropdevS <- anova(mmodS, mmod1, test = "Chisq")
dropdevR <- anova(mmodR, mmod1, test = "Chisq")

ALCOHOL:

The drop in deviance test finds a pvalue of 2.2e-16, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without alcohol.

dropdevA

## Analysis of Deviance Table
## 
## Model 1: muse ~ Cigarettes + Sex + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      2272     2334.2                          
## 2      2271     2243.4  1   90.784 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Cigarettes:

The drop in deviance test finds a pvalue of 2.2e-16, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without cigarette as a predictor.

dropdevC

## Analysis of Deviance Table
## 
## Model 1: muse ~ Alcohol + Sex + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      2272     2740.9                          
## 2      2271     2243.4  1   497.53 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Race:

The drop in deviance test finds a pvalue of .1384, which agrees with our Wald test. The more complex model containing all variables is not a significant improvement over the model without race.

dropdevR

## Analysis of Deviance Table
## 
## Model 1: muse ~ Alcohol + Cigarettes + Sex
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      2272     2245.6                     
## 2      2271     2243.4  1   2.1952   0.1384

Sex:

The drop in deviance test finds a pvalue of .001286, which agrees with our Wald test. The more complex model containing all variables is a significant improvement over the model without sex as a predictor. Clearly one of the sexes, male or female, is using more, so putting efforts towards this one sex could help more.

dropdevS

## Analysis of Deviance Table
## 
## Model 1: muse ~ Alcohol + Cigarettes + Race
## Model 2: muse ~ Alcohol + Cigarettes + Sex + Race
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1      2272     2253.7                        
## 2      2271     2243.4  1   10.363 0.001286 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In conclusion, the two most statistically significant variables were Alcohol and Cigarette use, with sex stil significant but slightly less so, and race not statistically significant. Students who used these substances were significiantly more likely to also use Marijuana. The two variables had multiplicative effects of 19.832 and 17.448 respectively on the odds of smoking marijuana. These are tremendous increases, and both are important to tackle. It's likely that when students use these two substances, it is a gateway to using marijuana. Efforts should be made to stop use of these two substances, and that would likely also affect marijuana usage. In looking at Sex and Race demographics, we found that being female and not white had multiplicative changes of 0.719 and 0.742. This tells us that females and non white people were less likely to use Marijuanna. However, Wald and Drop in Deviances tests indicated that Race is not a significant variable. The findings regarding sex were significant, however, and found that males experience higher odds of using Marijuana. Overall, we think that the most effective ways to reduce the odds of Marijuana usage is to decrease usage of cigarettes and alcohol.

B: Alligators

Below we create functions for each of the log odds equations given to us to simplify calculations.

#Functions for using the model without lots of typing
#Please enter only 1s and 0s into these functions. See Homework for definitions
logPi <- function(small,Hancock,Oklawaha,Trafford){-1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford}

logPr <- function(small,Hancock,Oklawaha,Trafford){-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford}

logPb <- function(small,Hancock,Oklawaha,Trafford){-2.09-0.63*small+0.70*Hancock-0.65*Oklawaha+1.09*Trafford}

logPo <- function(small,Hancock,Oklawaha,Trafford){-1.90+0.33*small+0.83*Hancock+0.01*Oklawaha+1.52*Trafford}

1. Big Lake George Gators

For each of the odds of another food source over fish, we found those odds to be very small for each. That means that large gators in big lake are more likely to choose fish over each of the other food sources, so fish would be the most popular food for the gators in our sample.

exp(logPi(0,0,0,0))

## [1] 0.212248

exp(logPr(0,0,0,0))

## [1] 0.03651617

exp(logPb(0,0,0,0))

## [1] 0.1236871

exp(logPo(0,0,0,0))

## [1] 0.1495686

2. Big Lake Hancock Gators

Below we find the probabilities for each primary food source for big alligators in lake Hancock. First we found the odds of each food source over fish. Then, using some math knowledge, we were able to find the odds of only fish (aka f/(1-f)). From there we could find the probability of eating fish, and then extract out the probabilities for each of the other food sources. (Trust us, we're math wizards, and we checked that the probabilities added to 1)

i <- exp(logPi(0,1,0,0))
r <- exp(logPr(0,1,0,0))
b <- exp(logPb(0,1,0,0))
o <- exp(logPo(0,1,0,0))
fodds <- (1/(i + r + b + o))
pf <- fodds/(1+fodds)
pi <- i*pf
pr <- r*pf
pb <- b*pf
po <- o*pf
print("probability of eating fish")

## [1] "probability of eating fish"

pf

## [1] 0.5686257

print("probability of eating invertebrates")

## [1] "probability of eating invertebrates"

pi

## [1] 0.02294781

print("probability of eating reptiles")

## [1] "probability of eating reptiles"

pr

## [1] 0.07175247

print("probability of eating birds")

## [1] "probability of eating birds"

pb

## [1] 0.1416306

print("probability of eating other")

## [1] "probability of eating other"

po

## [1] 0.1950434

#pf+pi+pr+pb+po checked that all probabilities add to 1

3. Small vs Large Gators

It appears that regardless of lake, the small alligators have a consistent 4.306 times higher odds of eating an invertebrate diet over fish than large alligators.

print("George")

## [1] "George"

exp(logPi(1,0,0,0))/exp(logPi(0,0,0,0))

## [1] 4.30596

print("Hancock")

## [1] "Hancock"

exp(logPi(1,1,0,0))/exp(logPi(0,1,0,0))

## [1] 4.30596

print("Oklawaha")

## [1] "Oklawaha"

exp(logPi(1,0,1,0))/exp(logPi(0,0,1,0))

## [1] 4.30596

print("Trafford")

## [1] "Trafford"

exp(logPi(1,0,0,1))/exp(logPi(0,0,0,1))

## [1] 4.30596

4. Invertebrates Over Reptiles

In order to compute the odds of a gator consuming primarily invertabrates rather than reptiles, we need to compue log(Pi/Pf) - log(Pr/Pf). The steps for how we found this is below.

log(Pi/Pr) = log(Pi) - log(Pr) = log(Pi) - log(Pf) - log(Pr) + log(Pf) = log(Pi/Pf) - log(Pr/Pf)

log(Pi/Pf) - log(Pr/Pf) = -1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford - (-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford)
= 1.76 + 1.81*small - 2.9*Hancock - 1.52*Oklawaha + 2.64*Trafford

Final regression equation: 
log(Pi/Pr) = 1.76 + 1.81*I(small) - 2.9*I(Hancock) - 1.52*I(Oklawaha) + 2.64*I(Trafford)

5. Wald Test

After conducting a Wald test on the variable small, we find a pvalue of .0001311202, which is below our significance level of .05. This means that we reject the null that there is no difference in the log odds of eating invertibrates over fish between small and large alligators.

teststat <- 1.46/.4
pnorm(teststat, lower.tail = FALSE)

## [1] 0.0001311202

Quiz 3

Sarah Chock and Lukas Buhler

4/4/2021

A: Marijuana

Model with All Predictors

Model Interpretation

B: Alligators

1. Big Lake George Gators

2. Big Lake Hancock Gators

3. Small vs Large Gators

4. Invertebrates Over Reptiles

5. Wald Test