Marijuana

mysurvey <- read.csv("survey.csv" )

This is a model for marijuana usage that uses all the available predictors (without interaction terms)

attach(mysurvey)
usevar <- cbind(mysurvey$Success, mysurvey$Failure)
fullmodel <- glm(usevar~ Alcohol + Cigarette + Sex + Race , family = "binomial",  data = mysurvey )
summary(fullmodel)

## 
## Call:
## glm(formula = usevar ~ Alcohol + Cigarette + Sex + Race, family = "binomial", 
##     data = mysurvey)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2642  -0.3413  -0.2520   0.3995   0.8531  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -5.7629     0.5103 -11.294  < 2e-16 ***
## AlcoholYes     2.9873     0.4655   6.417 1.39e-10 ***
## CigaretteYes   2.8592     0.1643  17.401  < 2e-16 ***
## Sexmale        0.3297     0.1026   3.212  0.00132 ** 
## Racewhite      0.2989     0.2015   1.483  0.13796    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 860.6086  on 15  degrees of freedom
## Residual deviance:   4.6939  on 11  degrees of freedom
## AIC: 60.689
## 
## Number of Fisher Scoring iterations: 5

exp(coef(fullmodel))

##  (Intercept)   AlcoholYes CigaretteYes      Sexmale    Racewhite 
##  0.003141973 19.832202285 17.447305099  1.390506658  1.348348845

Model interpretation

After doing ecponetiating coeffcients, I can see there multiplicative effect on the odds of a students using marijuana. 
I can see that alcohol holding else constant that if a student drinks alcohol, their odd of smoking marijana incerease by a factor of 19.832
I can see that cigarettes holding else constant that if a student smoke cigarettes, their odd of smoking marijana incerease by a factor of 17.447
Sex holding else constant that if a student is a male , their odd of smoking marijana incerease by a factor of 1.3905 higher than female
Race holding else constant that if a student is a white , their odd of smoking marijana incerease by a factor of 1.3483 higher than other races

Drop in deviance

modAlcohol <- glm(usevar~Cigarette+Sex+Race, family = binomial, data = mysurvey)
modCigarette <- glm(usevar~Alcohol+Sex+Race, family = binomial, data = mysurvey)
modSex <- glm(usevar~Alcohol+Cigarette+Race, family =binomial,data = mysurvey)
modRace <- glm(usevar~Alcohol+Cigarette+Sex, family = binomial, data =mysurvey)
dropAlcohol <- anova(modAlcohol, fullmodel, test = "Chisq")
dropCigarette <- anova(modCigarette, fullmodel, test = "Chisq")
dropSex <- anova(modSex, fullmodel, test = "Chisq")
dropRace <- anova(modRace, fullmodel, test = "Chisq")

Alcohol

dropAlcohol

## Analysis of Deviance Table
## 
## Model 1: usevar ~ Cigarette + Sex + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        12     95.478                          
## 2        11      4.694  1   90.784 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The drop in deviance find the p-value of  2.2e-16 this mean the variable is not significant improve the overall model without alcohol predictor

Cigarette

dropCigarette

## Analysis of Deviance Table
## 
## Model 1: usevar ~ Alcohol + Sex + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        12     502.23                          
## 2        11       4.69  1   497.53 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The drop in deviance find the p-value of 2.2e-16 this mean the variable is not significant improve the overall model without cigarettes predictor

Race

dropRace

## Analysis of Deviance Table
## 
## Model 1: usevar ~ Alcohol + Cigarette + Sex
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        12     6.8890                     
## 2        11     4.6939  1   2.1952   0.1384

The drop in deviance find the p-value of 0.1384 this mean the variable is not significant improve the overall model without Race predictor

Sex

dropSex

## Analysis of Deviance Table
## 
## Model 1: usevar ~ Alcohol + Cigarette + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1        12    15.0567                        
## 2        11     4.6939  1   10.363 0.001286 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The drop in deviance find the p-value of 0.001286 this mean the variable is not significant improve the overall model without sex predictor. We can easily see that sex could help more for the model

To sum up, alcohol and cigarette use are two of the satistical significant variables. We can conclude that students who used cigarette, alcohol or others subtances tend to used marijuana as well. Alcohol had multiplicative effects of 19.832 respectively on the odds of marijuana use. Cigarette had multiplicative effects of 17.447 respectively on the odds of marijuana use. In my opnions related to real life problem, the best way to decrease the amount of students using marijuan is decrease the amount of usage of smoking cigarette and drink alcohor as well as others subtances.

Alligators

logPI <- function(small,Hancock,Oklawaha,Trafford){-1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford}

logPR <- function(small,Hancock,Oklawaha,Trafford){-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford}

logPB <- function(small,Hancock,Oklawaha,Trafford){-2.09-0.63*small+0.70*Hancock-0.65*Oklawaha+1.09*Trafford}

logPO <- function(small,Hancock,Oklawaha,Trafford){-1.90+0.33*small+0.83*Hancock+0.01*Oklawaha+1.52*Trafford}

1. Lake George alligators

exp(logPI(0,0,0,0))

## [1] 0.212248

exp(logPR(0,0,0,0))

## [1] 0.03651617

exp(logPB(0,0,0,0))

## [1] 0.1236871

exp(logPO(0,0,0,0))

## [1] 0.1495686

2. Lake Hancock alligators

I <- exp(logPI(0,1,0,0))
R <- exp(logPR(0,1,0,0))
B <- exp(logPB(0,1,0,0))
O <- exp(logPO(0,1,0,0))
fodds <- (1/(I + R + B + O))
pf <- fodds/(1+fodds)
pI <- I*pf  ## eating fish probability 
pI

## [1] 0.02294781

pR <- R*pf ## eating invertebrates provability 
pR

## [1] 0.07175247

pB <- B*pf  ## eating reptiles provability 
pB

## [1] 0.1416306

pO <- O*pf ## eating other provability 
pO

## [1] 0.1950434

3. Small and large alligator

The small alligator have a consistent 4.306 time higer odds of eating invertebrate over fish than the large alligator

George

exp(logPI(1,0,0,0))/exp(logPI(0,0,0,0))

## [1] 4.30596

Hancock

exp(logPI(1,1,0,0))/exp(logPI(0,1,0,0))

## [1] 4.30596

Oklawaha

exp(logPI(1,0,1,0))/exp(logPI(0,0,1,0))

## [1] 4.30596

Trafford

exp(logPI(1,0,0,1))/exp(logPI(0,0,0,1))

## [1] 4.30596

4. Invertebrates vs Repetiles

Computes the odds of a alligator eating primarily invertebrates rather than reptiles
log(PI/PR) = log(PI) - log(PR) = log(PI) - log(Pf) - log(PR) + log(Pf) = log(PI/Pf) - log(PR/Pf)
subtitues the log(PI/Pf) and log(PR/Pf)
This is a final equation 
log(PI/PR) = 1.76 + 1.81*I(small) - 2.9*I(Hancock) - 1.52*I(Oklawaha) + 2.64*I(Trafford)

5. Wald Test

teststat <- 1.46/.4
pnorm(teststat, lower.tail = FALSE)

## [1] 0.0001311202

The P-value of small is 0.0001311 is below the significant level of .05. We are reject the null that there is no differences in the log odds of eating invertebrates over fish between large and small alligators

Week8

Phuong Nguyen

4/13/2021