mysurvey <- read.csv("survey.csv" )
This is a model for marijuana usage that uses all the available predictors (without interaction terms)
attach(mysurvey)
usevar <- cbind(mysurvey$Success, mysurvey$Failure)
fullmodel <- glm(usevar~ Alcohol + Cigarette + Sex + Race , family = "binomial", data = mysurvey )
summary(fullmodel)
##
## Call:
## glm(formula = usevar ~ Alcohol + Cigarette + Sex + Race, family = "binomial",
## data = mysurvey)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2642 -0.3413 -0.2520 0.3995 0.8531
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.7629 0.5103 -11.294 < 2e-16 ***
## AlcoholYes 2.9873 0.4655 6.417 1.39e-10 ***
## CigaretteYes 2.8592 0.1643 17.401 < 2e-16 ***
## Sexmale 0.3297 0.1026 3.212 0.00132 **
## Racewhite 0.2989 0.2015 1.483 0.13796
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 860.6086 on 15 degrees of freedom
## Residual deviance: 4.6939 on 11 degrees of freedom
## AIC: 60.689
##
## Number of Fisher Scoring iterations: 5
exp(coef(fullmodel))
## (Intercept) AlcoholYes CigaretteYes Sexmale Racewhite
## 0.003141973 19.832202285 17.447305099 1.390506658 1.348348845
After doing ecponetiating coeffcients, I can see there multiplicative effect on the odds of a students using marijuana.
I can see that alcohol holding else constant that if a student drinks alcohol, their odd of smoking marijana incerease by a factor of 19.832
I can see that cigarettes holding else constant that if a student smoke cigarettes, their odd of smoking marijana incerease by a factor of 17.447
Sex holding else constant that if a student is a male , their odd of smoking marijana incerease by a factor of 1.3905 higher than female
Race holding else constant that if a student is a white , their odd of smoking marijana incerease by a factor of 1.3483 higher than other races
modAlcohol <- glm(usevar~Cigarette+Sex+Race, family = binomial, data = mysurvey)
modCigarette <- glm(usevar~Alcohol+Sex+Race, family = binomial, data = mysurvey)
modSex <- glm(usevar~Alcohol+Cigarette+Race, family =binomial,data = mysurvey)
modRace <- glm(usevar~Alcohol+Cigarette+Sex, family = binomial, data =mysurvey)
dropAlcohol <- anova(modAlcohol, fullmodel, test = "Chisq")
dropCigarette <- anova(modCigarette, fullmodel, test = "Chisq")
dropSex <- anova(modSex, fullmodel, test = "Chisq")
dropRace <- anova(modRace, fullmodel, test = "Chisq")
dropAlcohol
## Analysis of Deviance Table
##
## Model 1: usevar ~ Cigarette + Sex + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 12 95.478
## 2 11 4.694 1 90.784 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The drop in deviance find the p-value of 2.2e-16 this mean the variable is not significant improve the overall model without alcohol predictor
dropCigarette
## Analysis of Deviance Table
##
## Model 1: usevar ~ Alcohol + Sex + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 12 502.23
## 2 11 4.69 1 497.53 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The drop in deviance find the p-value of 2.2e-16 this mean the variable is not significant improve the overall model without cigarettes predictor
dropRace
## Analysis of Deviance Table
##
## Model 1: usevar ~ Alcohol + Cigarette + Sex
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 12 6.8890
## 2 11 4.6939 1 2.1952 0.1384
The drop in deviance find the p-value of 0.1384 this mean the variable is not significant improve the overall model without Race predictor
dropSex
## Analysis of Deviance Table
##
## Model 1: usevar ~ Alcohol + Cigarette + Race
## Model 2: usevar ~ Alcohol + Cigarette + Sex + Race
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 12 15.0567
## 2 11 4.6939 1 10.363 0.001286 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The drop in deviance find the p-value of 0.001286 this mean the variable is not significant improve the overall model without sex predictor. We can easily see that sex could help more for the model
To sum up, alcohol and cigarette use are two of the satistical significant variables. We can conclude that students who used cigarette, alcohol or others subtances tend to used marijuana as well. Alcohol had multiplicative effects of 19.832 respectively on the odds of marijuana use. Cigarette had multiplicative effects of 17.447 respectively on the odds of marijuana use. In my opnions related to real life problem, the best way to decrease the amount of students using marijuan is decrease the amount of usage of smoking cigarette and drink alcohor as well as others subtances.
logPI <- function(small,Hancock,Oklawaha,Trafford){-1.55+1.46*small-1.66*Hancock+0.94*Oklawaha+1.12*Trafford}
logPR <- function(small,Hancock,Oklawaha,Trafford){-3.31-0.35*small+1.24*Hancock+2.46*Oklawaha+2.94*Trafford}
logPB <- function(small,Hancock,Oklawaha,Trafford){-2.09-0.63*small+0.70*Hancock-0.65*Oklawaha+1.09*Trafford}
logPO <- function(small,Hancock,Oklawaha,Trafford){-1.90+0.33*small+0.83*Hancock+0.01*Oklawaha+1.52*Trafford}
exp(logPI(0,0,0,0))
## [1] 0.212248
exp(logPR(0,0,0,0))
## [1] 0.03651617
exp(logPB(0,0,0,0))
## [1] 0.1236871
exp(logPO(0,0,0,0))
## [1] 0.1495686
I <- exp(logPI(0,1,0,0))
R <- exp(logPR(0,1,0,0))
B <- exp(logPB(0,1,0,0))
O <- exp(logPO(0,1,0,0))
fodds <- (1/(I + R + B + O))
pf <- fodds/(1+fodds)
pI <- I*pf ## eating fish probability
pI
## [1] 0.02294781
pR <- R*pf ## eating invertebrates provability
pR
## [1] 0.07175247
pB <- B*pf ## eating reptiles provability
pB
## [1] 0.1416306
pO <- O*pf ## eating other provability
pO
## [1] 0.1950434
The small alligator have a consistent 4.306 time higer odds of eating invertebrate over fish than the large alligator
exp(logPI(1,0,0,0))/exp(logPI(0,0,0,0))
## [1] 4.30596
exp(logPI(1,1,0,0))/exp(logPI(0,1,0,0))
## [1] 4.30596
exp(logPI(1,0,1,0))/exp(logPI(0,0,1,0))
## [1] 4.30596
exp(logPI(1,0,0,1))/exp(logPI(0,0,0,1))
## [1] 4.30596
Computes the odds of a alligator eating primarily invertebrates rather than reptiles
log(PI/PR) = log(PI) - log(PR) = log(PI) - log(Pf) - log(PR) + log(Pf) = log(PI/Pf) - log(PR/Pf)
subtitues the log(PI/Pf) and log(PR/Pf)
This is a final equation
log(PI/PR) = 1.76 + 1.81*I(small) - 2.9*I(Hancock) - 1.52*I(Oklawaha) + 2.64*I(Trafford)
teststat <- 1.46/.4
pnorm(teststat, lower.tail = FALSE)
## [1] 0.0001311202
The P-value of small is 0.0001311 is below the significant level of .05. We are reject the null that there is no differences in the log odds of eating invertebrates over fish between large and small alligators