This chapter described some of the most common generalized linear models, those used to model counts. It is important to never convert counts to proportions before analysis, because doing so destroys information about sample size. A fundamental difficulty with these models is that parameters are on a different scale, typically log-odds (for binomial) or log-rate (for Poisson), than the outcome variable they describe. Therefore computing implied predictions is even more important than before.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html
file as: YourName_ANLY505-Year-Semester.html and publish
the assignment to your R Pubs account and submit the link to Canvas.
Each question is worth 5 points.
11-1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?
# One of the reason is the likelihood in these two formats are different. and when converting likelihood in the aggregated form to the non-aggregated format, the c(n,m) multiplier is converted to an constant at the log-scale.
11-2. Use quap to construct a quadratic approximate joint posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Plot and compare the quadratic approximation to the joint posterior distribution produced instead from MCMC. Do not use the ‘pairs’ plot. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0,10). Re-estimate the posterior using both ulam and quap. Plot and compare the posterior distributions. Do not use the ‘pairs’ plot. Do the differences increase or decrease? Why?
data("chimpanzees")
d = chimpanzees
d$recipient = NULL
q2 = map(alist(
pulled_left ~ dbinom( 1 , p ) ,
logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left ,
a[actor] ~ dnorm(0,10),
bp ~ dnorm(0,10),
bpC ~ dnorm(0,10)
) ,
data=d)
pairs(q2)
#We can see that except a little bit increase in the MCMC model's posterior standard deviation, the quadratic approximate and the MCMC distribution both look almost similar.
11-3. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. Plot the joint posterior. What changes do you observe?
data(Kline)
d = Kline
d$P = scale( log(d$population) )
d$id = ifelse( d$contact=="high" , 2 , 1 )
d
## culture population contact total_tools mean_TU P id
## 1 Malekula 1100 low 13 3.2 -1.291473310 1
## 2 Tikopia 1500 low 22 4.7 -1.088550750 1
## 3 Santa Cruz 3600 low 24 4.0 -0.515764892 1
## 4 Yap 4791 high 43 5.0 -0.328773359 2
## 5 Lau Fiji 7400 high 33 5.0 -0.044338980 2
## 6 Trobriand 8000 high 19 4.0 0.006668287 2
## 7 Chuuk 9200 high 40 3.8 0.098109204 2
## 8 Manus 13000 low 28 6.6 0.324317564 1
## 9 Tonga 17500 high 55 5.4 0.518797917 2
## 10 Hawaii 275000 low 71 6.6 2.321008320 1
d = subset(d, d$culture != "Hawaii")
d$P = scale( log(d$population) )
d$id = ifelse( d$contact=="high" , 2 , 1 )
d
## culture population contact total_tools mean_TU P id
## 1 Malekula 1100 low 13 3.2 -1.6838108 1
## 2 Tikopia 1500 low 22 4.7 -1.3532297 1
## 3 Santa Cruz 3600 low 24 4.0 -0.4201043 1
## 4 Yap 4791 high 43 5.0 -0.1154764 2
## 5 Lau Fiji 7400 high 33 5.0 0.3478956 2
## 6 Trobriand 8000 high 19 4.0 0.4309916 2
## 7 Chuuk 9200 high 40 3.8 0.5799580 2
## 8 Manus 13000 low 28 6.6 0.9484740 1
## 9 Tonga 17500 high 55 5.4 1.2653019 2
#We can see that except the slope of Hawaii, the slopes in outputs are look like the same.
9-4. Use WAIC or PSIS to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.
data("chimpanzees")
d = chimpanzees
m11.1 = map(
alist(
pulled_left ~ dbinom(1, p),
logit(p) <- a ,
a ~ dnorm(0,10)
),
data=d )
m11.2 = map(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a + bp*prosoc_left ,
a ~ dnorm(0,10) ,
bp ~ dnorm(0,10)
),
data=d )
m11.3 = map(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a + (bp + bpC*condition)*prosoc_left ,
a ~ dnorm(0,10) ,
bp ~ dnorm(0,10) ,
bpC ~ dnorm(0,10)
), data=d )
m11.4 = map(
alist(
pulled_left ~ dbinom(1, p),
logit(p) <- a[actor] + (bp + bpC*condition)*prosoc_left,
a[actor] ~ dnorm(0, 10),
bp ~ dnorm(0, 10),
bpC ~ dnorm(0, 10)
),
data = d)
compare(m11.1,m11.2,m11.3,m11.4)
## WAIC SE dWAIC dSE pWAIC weight
## m11.4 550.7035 18.588945 0.0000 NA 15.6127887 1.000000e+00
## m11.2 680.4658 9.271222 129.7624 18.02099 1.9843982 6.644504e-29
## m11.3 681.9773 9.369869 131.2738 17.96985 2.8181680 3.120699e-29
## m11.1 687.8718 7.087756 137.1683 18.84570 0.9656917 1.637854e-30
#According to use the WAIC or PSIS to compare the chimpanzee model, m114.4 seem like the best model for all models.
11-5. Explain why the logit link is appropriate for a binomial generalized linear model?
#Firstly, the logit link maps a parameter that is defined as a probability mass, its value lies between 0 & 1. Secondly, a binomial generalized model always generates binary outcome variables, such as either 0 or 1. So, logit link is appropriate for a binomial generalized linear model.