This chapter described some of the most common generalized linear models, those used to model counts. It is important to never convert counts to proportions before analysis, because doing so destroys information about sample size. A fundamental difficulty with these models is that parameters are on a different scale, typically log-odds (for binomial) or log-rate (for Poisson), than the outcome variable they describe. Therefore computing implied predictions is even more important than before.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
11E1. If an event has probability 0.35, what are the log-odds of this event?
log_odds <- log(0.35/(1-0.35))
log_odds
## [1] -0.6190392
11E2. If an event has log-odds 3.2, what is the probability of this event?
prob <- 1/(1+exp(-3.2))
prob
## [1] 0.9608343
11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?
coeff <- exp(1.7)
coeff
## [1] 5.473947
11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.
#Poisson regressions are measured as events per unit time so in order to make sure they're on scale, an offset is required sometimes.Weather forecast is a better example.
11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?
#Because the multiplier variable c(n,m) is also changed into a constant.
11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in the outcome?
exp(1.7)
## [1] 5.473947
#It implies that the change in outcome will increase by 5.47 times
11M3. Explain why the logit link is appropriate for a binomial generalized linear model.
#Binomial generalized linear model will have a binary outcome which has a probability between 0 and 1, and logit link will satisfy this criteria.
11M4. Explain why the log link is appropriate for a Poisson generalized linear model.
#Poisson generalized linear model will have a positive outcome and logit link will be between 0 and 1 and thus satisfying the criteria.
11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?
#It implies that the outcome of the poisson generalized linear model will be positive and between 0 and 1
11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?
# The constraints are different because of the different distributions. The constraints of the binomial and poisson distributions are 1) For binomial: Fixed probability and binary events 2) For poisson: variance must be equal to expected values.
11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do the differences increase or decrease? Why?
library("rethinking")
data("chimpanzees")
d<- chimpanzees
d$recipient <- NULL
q2 <- map(alist(
pulled_left ~ dbinom( 1 , p ) ,
logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left ,
a[actor] ~ dnorm(0,10),
bp ~ dnorm(0,10),
bpC ~ dnorm(0,10)
) ,
data=d)
pairs(q2)
## Both look almost similar.
11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?
data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$id <- ifelse( d$contact=="high" , 2 , 1 )
d
## culture population contact total_tools mean_TU P id
## 1 Malekula 1100 low 13 3.2 -1.291473310 1
## 2 Tikopia 1500 low 22 4.7 -1.088550750 1
## 3 Santa Cruz 3600 low 24 4.0 -0.515764892 1
## 4 Yap 4791 high 43 5.0 -0.328773359 2
## 5 Lau Fiji 7400 high 33 5.0 -0.044338980 2
## 6 Trobriand 8000 high 19 4.0 0.006668287 2
## 7 Chuuk 9200 high 40 3.8 0.098109204 2
## 8 Manus 13000 low 28 6.6 0.324317564 1
## 9 Tonga 17500 high 55 5.4 0.518797917 2
## 10 Hawaii 275000 low 71 6.6 2.321008320 1
d <- subset(d, d$culture != "Hawaii")
d$P <- scale( log(d$population) )
d$id <- ifelse( d$contact=="high" , 2 , 1 )
d
## culture population contact total_tools mean_TU P id
## 1 Malekula 1100 low 13 3.2 -1.6838108 1
## 2 Tikopia 1500 low 22 4.7 -1.3532297 1
## 3 Santa Cruz 3600 low 24 4.0 -0.4201043 1
## 4 Yap 4791 high 43 5.0 -0.1154764 2
## 5 Lau Fiji 7400 high 33 5.0 0.3478956 2
## 6 Trobriand 8000 high 19 4.0 0.4309916 2
## 7 Chuuk 9200 high 40 3.8 0.5799580 2
## 8 Manus 13000 low 28 6.6 0.9484740 1
## 9 Tonga 17500 high 55 5.4 1.2653019 2
11H1. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.
data("chimpanzees")
d <- chimpanzees
m11.1 <- map(
alist(
pulled_left ~ dbinom(1, p),
logit(p) <- a ,
a ~ dnorm(0,10)
),
data=d )
m11.2 <- map(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a + bp*prosoc_left ,
a ~ dnorm(0,10) ,
bp ~ dnorm(0,10)
),
data=d )
m11.3 <- map(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a + (bp + bpC*condition)*prosoc_left ,
a ~ dnorm(0,10) ,
bp ~ dnorm(0,10) ,
bpC ~ dnorm(0,10)
), data=d )
m11.4 <- map(
alist(
pulled_left ~ dbinom(1, p),
logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
a[actor] ~ dnorm(0, 10),
bp ~ dnorm(0, 10),
bpC ~ dnorm(0, 10)
),
data = d)
compare(m11.1,m11.2,m11.3,m11.4)
## WAIC SE dWAIC dSE pWAIC weight
## m11.4 552.9589 18.467085 0.0000 NA 17.6123061 1.000000e+00
## m11.2 680.5950 9.367326 127.6362 17.98441 2.0490500 1.923808e-28
## m11.3 682.4934 9.299811 129.5345 17.88340 3.0732792 7.446253e-29
## m11.1 687.8568 7.137769 134.8979 18.76669 0.9583371 5.096719e-30